Canonical Ordering Algorithm (Unicode)
Last updated at 9:21 am UTC on 10 December 2015
D108 Reorderable pair:
Two adjacent characters A and B in a coded character sequence are a Reorderable Pair
if and only if ccc(A) > ccc(B) > 0.
D109 Canonical Ordering Algorithm:
In a decomposed character sequence D, exchange the positions of the characters in each Reorderable Pair until the sequence contains no
more Reorderable Pairs.
- In effect, the Canonical Ordering Algorithm is a local bubble sort that guarantees that a Canonical Decomposition or a Compatibility Decomposition will contain no subsequences in which a combining mark is followed directly by another combining mark that has a lower, non-zero combining class.
- Canonical ordering is defined in terms of application of the Canonical Ordering Algorithm to an entire decomposed sequence.
For example, canonical decomposition of the sequence
U+1E0B latin small letter d with dot above, U+0323 combining dot below
would result in the sequence
U+0064 latin small letter d, U+0307 combining dot above, U+0323 combining dot below,
a sequence which is not yet in canonical order.
Most decompositions for Unicode strings are already in canonical order.
Another example would be a vowel plus a nasalisation and acute diacritial mark (taken from Combining Diacritical Marks Range (Unicode)).
The diacritical marks could either be correctly canonically ordered or not.