Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Example of a full canonical decomposition (Unicode)
Last updated at 7:53 pm UTC on 9 December 2015
http://www.unicode.org/reports/tr15/tr15-18.html#Introduction


1. Take the string with the characters "c" (a-acute, c, acute, cedilla)

    testString := 
        16r00E1 asCharacter asString, 
        'c',
        16r0301 asCharacter asString,
        16r0327 asCharacter asString.

    testString size
      4


2. The data file contains the following relevant information:

        code; name; ... canonical class; ... decomposition.
        0061;LATIN SMALL LETTER A;...0;...
        0063;LATIN SMALL LETTER C;...0;...
        00E1;LATIN SMALL LETTER A WITH ACUTE;...0;...0061 0301;...
        0107;LATIN SMALL LETTER C WITH ACUTE;...0;...0063 0301;...
        0301;COMBINING ACUTE ACCENT;...230;...
        0327;COMBINING CEDILLA;...202;...



     testString := 
       16r00E1 asCharacter asString, 
       'c',
       16r0301 asCharacter asString,
       16r0327 asCharacter asString.




3. Applying the canonical decomposition mappings, we get "ac" (a, acute, c, acute, cedilla).
This is because 00E1 (a-acute) has a canonical decomposition mapping to 0061 0301 (a, acute)

       testString asDecomposedUnicode asOrderedCollection collect: [:code | code asInteger printStringRadix: 16]  
 
       an OrderedCollection('16r61' '16r301' '16r63' '16r301' '16r327')



4. Applying the canonical ordering, we get "ac" (a, acute, c, cedilla, acute).
This is because cedilla has a lower canonical ordering value (202) than acute (230) does. The positions of 'a' and 'c' are not affected, since they are starters.