Unicode Support


	links to this page:

Unicode Support

Last updated at 8:40 am UTC on 7 December 2015

Since Squeak 3.8 Unicode is supported.

Class Unicode

How could Squeak's support for Unicode be improved?

more unicode to character code translators.
review existing code and clean it up, some classes have methods that have different code to do same thing.
check the code for correctness, and fix it (example: the unicodeToMacRoman actually does unicode to latin1).
Improve performance of the translators: as an example the UTF8 translator creates a write stream, but then actually writes out bytes to your file system one byte at a time as it considers if the unicode byte needs to be made into a utf8 sequence. Needless to say if you have a multiple MB text file this suddenly takes millions of file I/O requests to the operating system which is really slow.

Earlier notes 2006

Some years ago, Yoshiki Ohshima had a japanese version of Squeak, and recently he presented a proposal for Unicode support and a first implementation. Details are on this page:
Multilingual Squeak

Boris Gaertner tried to write an encoding-aware version of Scamper. A first result can be found on this page:
http://www.bgaertner.gmxhome.de/UnicodeProject.htm

It would be a good next step to select the best from both proposals and to create an unified implementation proposal. Selection of a common implementation policy should also examine the question of possible or desireable changes in the VM. A lot of problems are still not solved. For example, we do not currently have the possibility to use one of the IME-s (Input Method Editors) to enter text in Unicode.