Last updated at 11:46 am UTC on 12 December 2015
Each Unicode character is represented with 4 bytes .
Squeak implements UFT-32 for Unicode code points with the Character class.
- The UTF-32 form of a code point is a direct representation of that code point's numerical value.
- The main advantage of UTF-32, versus variable-length encodings like UTF-8, is that the Unicode code points are directly indexable.
- Implementation of algorithms is straightforward as the format is 32 bit, see for example String asDecomposedUnicode.
 https://en.wikipedia.org/wiki/UTF-32, SIL, Mapping code points to Unicode encoding forms