Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
HtmlTokenizer
Last updated at 7:57 am UTC on 16 September 2017
In category: 'Etoys-Squeakland-Network-HTML-Tokenizer' in Squeak 6.0a.

This class takes a text stream and produces a sequence of HTML tokens.

It requires its source stream to support #peek.

 HtmlTokenizer on: aStream

 | tokenizer htmlSource |
 htmlSource :=  '<h1>The title of my report</h1><p>This report is about ...</p>'.

 tokenizer := HtmlTokenizer on: htmlSource readStream.
 Transcript clear.
 [tokenizer atEnd] whileFalse: [Transcript show: tokenizer next printString; cr]

Output on Transcript
{HtmlTag:<h1>}
{HtmlText:The title of my report}
{HtmlTag:</h1>}
{HtmlTag:<p>}
{HtmlText:This report is about ...}
{HtmlTag:</p>}



HtmlTokenizer is used by the HtmlParser class.

Tokens types are
 HtmlToken printHierarchy '
 ProtoObject #()
	Object #()
		HtmlToken #(''source'')
			HtmlComment #()
			HtmlTag #(''isNegated'' ''name'' ''attribs'')
			HtmlText #(''text'')'

Implementaton of HtmlTokenizer next

next 
	"return the next HtmlToken, or nil if there are no more"
	|token|

	"branch, depending on what the first character is"
	self atEnd ifTrue: [ ^nil ].
	self peekChar = $< 
		ifTrue: [ token := self nextTagOrComment ]
		ifFalse: [ token := self nextText ].


	"return the token, modulo modifications inside of textarea's"
	textAreaLevel > 0 ifTrue: [
		(token isTag and: [ token name = 'textarea' ]) ifTrue: [
			"textarea tag--change textAreaLevel accordingly"

			token isNegated
				ifTrue: [ textAreaLevel := textAreaLevel - 1 ]
				ifFalse: [ textAreaLevel := textAreaLevel -2 ].

			textAreaLevel > 0
				ifTrue: [ 
					"still inside a <textarea>, so convert this tag to text"
					^HtmlText forSource: token source ]
				ifFalse: [ "end of the textarea; return the tag"  ^token ] ].
			"end of the textarea"

		"inside the text area--return the token as text"
		^HtmlText forSource: token source ].

	(token isTag and: [ token isNegated not and: [ token name = 'textarea' ]]) ifTrue: [
		"beginning of a textarea"
		inTextArea := true.
		^token ].
		

	^token