Last updated at 10:06 am UTC on 28 January 2017
The Squeak Character class is implemented using a 32bit immediate value to encode a Unicode code point. There is space left for additional information. This is Squeak specific and not in the Unicode standard. The leading character indicates the encoding. Other encodings than Unicode are supported. Each character has this encoding information.
For more see comment of class Character.
Note about history
Chris CunninghamThu, Jan 26, 2017 at 11:36 PM
Reply-To: The general-purpose Squeak developers list firstname.lastname@example.org>
To: The general-purpose Squeak developers list
So, back in 2009, Andreas Raab proposed:
What I would propose to do here is to define that "leadingChar = 0" which currently means "Latin1 encoding, language neutral" is being redefined to "Unicode encoding, language neutral". What this does is that "Character value: 353" and "Unicode value: 353" become the same, if the environment is considered language neutral which by default it would be.
In 2010, he pushed this into Squeak Trunk.
Then, in 2011, there was a conversation where Andreas stated:
On 1/8/2011 2:16 AM, Sean P. DeNigris wrote:
"In Squeak Character encoding, bits above 16r3FFFFF don't encode the
character, but hold information about the language environment and the
encoding which should be used to interpret the charCode. The background of
which is Han unification (http://en.wikipedia.org/wiki/Han_unification)."
How's that as a method comment? Is it really "In Squeak... encoding..." or
does this apply to unicode in general?
It is Squeak specific. Unicode does not have a leading char.