links to this page:
Multilingual support - UTF-8
Swiki Page List
Unicode glossary
UTF-8
List of Unicode pages
UTF-32
Last updated at 11:46 am UTC on 12 December 2015
Each
Unicode
character is represented with 4 bytes [1].
Squeak implements UFT-32 for
Unicode code point
s with the
Character
class.
The UTF-32 form of a code point is a direct representation of that code point's numerical value.
The main advantage of UTF-32, versus variable-length encodings like
UTF-8
, is that the Unicode code points are directly indexable.
Implementation of algorithms is straightforward as the format is 32 bit, see for example
String asDecomposedUnicode
.
References
[1]
http://www.unicode.org/glossary/#UTF_32
[2]
https://en.wikipedia.org/wiki/UTF-32
,
SIL, Mapping code points to Unicode encoding forms