Squeak
  links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Multilingual support - Character objects
Last updated at 3:31 pm UTC on 23 September 1999
From: "Peter William Lount"
Subject: Re: Unicode support
Date: Wed, 15 Sep 1999 12:22:44 -0700

Hi,

I've never liked the fact that strings were made up of bytes. This is not
an object oriented approach to strings. It was a "space optimization" and
is a throw back to the days of limited memory systems.

What about going to the most "general internal character" representation:
each character in a string is a REAL object instance.
GeneralCharacterStrings (for lack of a better name at the moment) are then
made up of 32 bit pointers to character objects. Each character object is
configured with information about how to represent it in differnt character
encodings. These encodings allow for convertion to and from the "general
internal character" representation. This would also allow conversions
between any two encodings. A GeneralCharacterString could then contain a
mixture of characters from any language or any special characters from any
encoding.

Yes this would take up more space for characters (32 bits v.s. 8 or 16 or
21 bits) but it would be simpler and faster for string operations. Each
GeneralCharacter would be a unique instance just like the way the existing
256 ASCII character instances have been done in Smalltalk.

Peter William Lount
peter@smalltalk.org
http://www.smalltalk.org