Normalization test cases (Unicode)
Last updated at 9:09 pm UTC on 18 December 2015
The database of test cases for normalization is a text file with fields separated by semicolons.
The columns (c1, c2,...) have the following meaning:
source; NFC; NFD; NFKC; NFKD
http://www.unicode.org/Public/UCD/latest/ucd/NormalizationTest.txt
@Part0 # Specific cases
#
1E0A;1E0A;0044 0307;1E0A;0044 0307; # (Ḋ; Ḋ; D◌̇; Ḋ; D◌̇; ) LATIN CAPITAL LETTER D WITH DOT ABOVE
.....
@Part1 # Character by character test
# All characters not explicitly occurring in c1 of Part 1 have identical NFC, D, KC, KD forms.
#
00A0;00A0;00A0;0020;0020; # ( ; ; ; ; ; ) NO-BREAK SPACE
00A8;00A8;00A8;0020 0308;0020 0308; # (¨; ¨; ¨; ◌̈; ◌̈; ) DIAERESIS
...
more of part 1
...
@Part2 # Canonical Order Test
#
0061 0315 0300 05AE 0300 0062;00E0 05AE 0300 0315 0062;0061 05AE 0300 0300 0315 0062;00E0 05AE 0300 0315 0062;0061 05AE 0300 0300 0315 0062; # (a◌̕◌̀◌֮◌̀b; à◌֮◌̀◌̕b; a◌֮◌̀◌̀◌̕b; à◌֮◌̀◌̕b; a◌֮◌̀◌̀◌̕b; ) LATIN SMALL LETTER A, COMBINING COMMA ABOVE RIGHT, COMBINING GRAVE ACCENT, HEBREW ACCENT ZINOR, COMBINING GRAVE ACCENT, LATIN SMALL LETTER B
0061 0300 0315 0300 05AE 0062;00E0 05AE 0300 0315 0062;0061 05AE 0300 0300 0315 0062;00E0 05AE 0300 0315 0062;0061 05AE 0300 0300 0315 0062; # (a◌̀◌̕◌̀◌֮b; à◌֮◌̀◌̕b; a◌֮◌̀◌̀◌̕b; à◌֮◌̀◌̕b; a◌֮◌̀◌̀◌̕b; ) LATIN SMALL LETTER A, COMBINING GRAVE ACCENT, COMBINING COMMA ABOVE RIGHT, COMBINING GRAVE ACCENT, HEBREW ACCENT ZINOR, LATIN SMALL LETTER B
...
@Part3
# PRI
#29 Test
#
09C7 0334 09BE;09C7 0334 09BE;09C7 0334 09BE;09C7 0334 09BE;09C7 0334 09BE; # (ে◌̴া; ে◌̴া; ে◌̴া; ে◌̴া; ে◌̴া; ) BENGALI VOWEL SIGN E, COMBINING TILDE OVERLAY, BENGALI VOWEL SIGN AA