Canonical Ordering Algorithm (Unicode)

Last updated at 9:21 am UTC on 10 December 2015

http://www.unicode.org/versions/Unicode8.0.0/UnicodeStandard-8.0.pdf

p. 137

D108 Reorderable pair:

Two adjacent characters A and B in a coded character sequence are a Reorderable Pair

if and only if ccc(A) > ccc(B) > 0.

D109 Canonical Ordering Algorithm:

In a decomposed character sequence D, exchange the positions of the characters in each Reorderable Pair until the sequence contains no

more Reorderable Pairs.

- In effect, the Canonical Ordering Algorithm is a local bubble sort that guarantees that a Canonical Decomposition or a Compatibility Decomposition will contain no subsequences in which a combining mark is followed directly by another combining mark that has a lower, non-zero combining class.
- Canonical ordering is defined in terms of application of the Canonical Ordering Algorithm to an entire decomposed sequence.

### Examples

For example, canonical decomposition of the sequence

U+1E0B latin small letter d with dot above, U+0323 combining dot below

would result in the sequence

U+0064 latin small letter d, U+0307 combining dot above, U+0323 combining dot below,

a sequence which is not yet in canonical order.

Most decompositions for Unicode strings are already in canonical order.

Another example would be a vowel plus a nasalisation and acute diacritial mark (taken from Combining Diacritical Marks Range (Unicode)).

The diacritical marks could either be correctly canonically ordered or not.