UTF-32

UTF-32

Last updated at 11:46 am UTC on 12 December 2015

Each Unicode character is represented with 4 bytes [1].

Squeak implements UFT-32 for Unicode code points with the Character class.

The UTF-32 form of a code point is a direct representation of that code point's numerical value.
The main advantage of UTF-32, versus variable-length encodings like UTF-8, is that the Unicode code points are directly indexable.
Implementation of algorithms is straightforward as the format is 32 bit, see for example String asDecomposedUnicode.