links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Last updated at 7:57 am UTC on 16 September 2017
In category: 'Etoys-Squeakland-Network-HTML-Tokenizer' in Squeak 6.0a.

This class takes a text stream and produces a sequence of HTML tokens.

It requires its source stream to support #peek.

 HtmlTokenizer on: aStream

 | tokenizer htmlSource |
 htmlSource :=  '<h1>The title of my report</h1><p>This report is about ...</p>'.

 tokenizer := HtmlTokenizer on: htmlSource readStream.
 Transcript clear.
 [tokenizer atEnd] whileFalse: [Transcript show: tokenizer next printString; cr]

Output on Transcript
{HtmlText:The title of my report}
{HtmlText:This report is about ...}

HtmlTokenizer is used by the HtmlParser class.

Tokens types are
 HtmlToken printHierarchy '
 ProtoObject #()
	Object #()
		HtmlToken #(''source'')
			HtmlComment #()
			HtmlTag #(''isNegated'' ''name'' ''attribs'')
			HtmlText #(''text'')'

Implementaton of HtmlTokenizer next

	"return the next HtmlToken, or nil if there are no more"

	"branch, depending on what the first character is"
	self atEnd ifTrue: [ ^nil ].
	self peekChar = $< 
		ifTrue: [ token := self nextTagOrComment ]
		ifFalse: [ token := self nextText ].

	"return the token, modulo modifications inside of textarea's"
	textAreaLevel > 0 ifTrue: [
		(token isTag and: [ token name = 'textarea' ]) ifTrue: [
			"textarea tag--change textAreaLevel accordingly"

			token isNegated
				ifTrue: [ textAreaLevel := textAreaLevel - 1 ]
				ifFalse: [ textAreaLevel := textAreaLevel -2 ].

			textAreaLevel > 0
				ifTrue: [ 
					"still inside a <textarea>, so convert this tag to text"
					^HtmlText forSource: token source ]
				ifFalse: [ "end of the textarea; return the tag"  ^token ] ].
			"end of the textarea"

		"inside the text area--return the token as text"
		^HtmlText forSource: token source ].

	(token isTag and: [ token isNegated not and: [ token name = 'textarea' ]]) ifTrue: [
		"beginning of a textarea"
		inTextArea := true.
		^token ].