links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Canonical Ordering Algorithm (Unicode)
Last updated at 9:21 am UTC on 10 December 2015

p. 137

D108 Reorderable pair:
Two adjacent characters A and B in a coded character sequence are a Reorderable Pair
if and only if ccc(A) > ccc(B) > 0.

D109 Canonical Ordering Algorithm:
In a decomposed character sequence D, exchange the positions of the characters in each Reorderable Pair until the sequence contains no
more Reorderable Pairs.


For example, canonical decomposition of the sequence
U+1E0B latin small letter d with dot above, U+0323 combining dot below

would result in the sequence
U+0064 latin small letter d, U+0307 combining dot above, U+0323 combining dot below, 

a sequence which is not yet in canonical order.
Most decompositions for Unicode strings are already in canonical order.

Another example would be a vowel plus a nasalisation and acute diacritial mark (taken from Combining Diacritical Marks Range (Unicode)).
The diacritical marks could either be correctly canonically ordered or not.