Last updated at 8:40 am UTC on 7 December 2015
Since Squeak 3.8 Unicode is supported.
How could Squeak's support for Unicode be improved?
- more unicode to character code translators.
- review existing code and clean it up, some classes have methods that have different code to do same thing.
- check the code for correctness, and fix it (example: the unicodeToMacRoman actually does unicode to latin1).
- Improve performance of the translators: as an example the UTF8 translator creates a write stream, but then actually writes out bytes to your file system one byte at a time as it considers if the unicode byte needs to be made into a utf8 sequence. Needless to say if you have a multiple MB text file this suddenly takes millions of file I/O requests to the operating system which is really slow.
Earlier notes 2006
Some years ago, Yoshiki Ohshima had a japanese version of Squeak, and recently he presented a proposal for Unicode support and a first implementation. Details are on this page:
Boris Gaertner tried to write an encoding-aware version of Scamper. A first result can be found on this page:
It would be a good next step to select the best from both proposals and to create an unified implementation proposal. Selection of a common implementation policy should also examine the question of possible or desireable changes in the VM. A lot of problems are still not solved. For example, we do not currently have the possibility to use one of the IME-s (Input Method Editors) to enter text in Unicode.