Multilingual Support - Multilingual enhancement to GNU Emacs


	links to this page:

Last updated at 2:19 am UTC on 24 September 1999

[23-Sep-1999 /OHSHIMA Yoshiki]

wrote to the Squeak mailing list

Please note that the design of my implementation is heavily influenced by "Mule" (Multilingual enhancement to GNU Emacs), whose development has began more than 12 years ago and still agressively improving.

Roughly speaking, the character representation in my implementation is somewhat similar to SmallInteger/LargePositiveInteger integration. The ISO-8859-1 characters are represented in the same way as current Character, and the others are represented as an object with 30 bit value field. Currently, there is no assist from the VM, so you can test it with vanilla VM.

Because I found many glitches in the implementation, don't think that the implementation is "final". (I think the names "MultiString" and "MultiCharacter" may not be good names:-).

I suppose you know the early history of Unicode standard. I believe that the committee was about to decide to go with 4byte representation, but a few company with loud-voice, including MS and Apple overturned the conscientious decision. IMHO, the six years work can't straighten the failure at the start, if the starting point is wrong.

On a system like Squeak, where the glyph of a character should be controlled by the system itself, the character should know how to represent itself. This is the reason why I think the representation should carry enough information more than 16-bit representation.

One more thing I'd like to say is, the Unicode could be a "local" encoding in my framework. There is so much software which assumes Unicode, Squeak should be able to support it. However, the local encoding would not have glyph, because it's Unicode.