links to this page:    
View this PageEdit this PageUploads to this PageHistory of this PageTop of the SwikiRecent ChangesSearch the SwikiHelp Guide
Multilingual Support - Number hierarchy as an example
Last updated at 4:35 pm UTC on 15 September 2003
Smalltalk-80 proved that we don't need to conceptually special case the most common scenario to have full functionality and efficiency: the Number hierarchy, and particularly the Integer hierarchy is a case in point: Smalltalk seamlessly integrates the special case (SmallInteger) in a breathtakingly fast and almost cost-free operation, while providing broad flexibility for the more general case. The Number architecture provides the essential functionality upon which the rest is based.

Marcel seems to have this right by focusing on the essential question: "What is a string?" Once we have the essential protocol, the rest revolves around making an intelligent hierarchy, with an eye toward making the special case or cases (ASCII/UTF or whatever) efficient as hell and cost-free in terms of function.

We have already seen a number of extensions of String (Text) and the experiment was worth watching. OpenDocs provides another model.

What we need to do is think bigger first – what is the essence of the String – and then how do we provide all the encodings (and then conversions between them.

Subject: Unicode support
Date: Tue, 21 Sep 1999 11:21:19 -0700
From: Todd Blanchard

How a string stores whatever it stores is never anybody's business, as with any other object. Wether it stores character objects, LZW->compressed variable strings, UTF-8, whatever shouldn't matter to its clients.

I agree. Inisiting that we just have String which is implemented as a sequence of Character objects (where Character presumably has multiple implementations similar to the way SmallInteger vs LargeInteger is handled) is a naive implementation that we are not likely to be able to afford.

The system should be able to take advantage of space/time optimizations where appropriate. For instance - using a single byte representation when all characters in the string are in ISO-8859-1.