String asDecomposedUnicode
Last updated at 10:01 pm UTC on 10 December 2015
The class String has a method
asDecomposedUnicode
"Convert the receiver into a decomposed Unicode representation.
Optimized for the common case that no decomposition needs to take place."
| lastIndex nextIndex out decomposed |
lastIndex := 1.
nextIndex := 0.
[(nextIndex := nextIndex+1) = self size] whileTrue:[
decomposed := Unicode decompose: (self at: nextIndex).
decomposed ifNotNil:[
lastIndex = 1 ifTrue:[out := WriteStream on: (String new: self size)].
out nextPutAll: (self copyFrom: lastIndex to: nextIndex-1).
out nextPutAll: decomposed.
lastIndex := nextIndex+1.
].
].
^out ifNil:[self] ifNotNil:[
out nextPutAll: (self copyFrom: lastIndex to: self size).
out contents]
The method
Unicode decompose: aCharacter
uses the Decompositions class variable of Unicode. This variable has been initialized from UnicodeData.txt with the Unicode Decomposition Mapping property.
Example:
'ö' asDecomposedUnicode
'é' asDecomposedUnicode asOrderedCollection collect: [:code | code asInteger]
an OrderedCollection(101 769)
more details
Notes
See also
String asPrecomposedUnicode