Multilingual Support - Generic String protocol


	links to this page:

Last updated at 3:48 pm UTC on 23 September 1999

unabridged - please summarize

Author Marcel Weiher

Subject: Re: Unicode support
Date: Tue, 21 Sep 1999 16:20:28 +0200
From: Marcel Weiher

> From: "Peter William Lount"
>
> I agree with you that we shouldn't be concerned with how strings store
> their characters if that's all that is too be stored in a string.

I don't see why the restriction. How a string stores whatever it
stores is never anybody's business, as with any other object. Wether
it stores character objects, LZW-compressed variable strings, UTF-8,
whatever shouldn't matter to its clients.

> It does
> mean that strings are based on "byte/double byte encodings" and not on
> general "object oriented" concepts. So we end up with many
"encoding types"
> of strings. This is probably necessary given the reality of different
> encoding systems. However, it's not very general. Having an
GeneralString
> that is entirely independent of ANY encoding system while being able to
> convert to any encoding system is a very powerful idea.

Yes, having 'GeneralString' as an additional 'encoding' any string
is required to be able to convert itself to seems useful. Once
again, how this is actually stored is simply none of anybody's
business. Adding a class that uses this as its native encoding is
also good. Making this the implementation would be suicide
for many applications.

> Also the GeneralString could hold more than just "characters" if
characters
> are actual objects instead of bytes. Any object, like a icon or
graphic,
> could be put into the string as long as they respond to the "character
> protocol". For example, a HTMLink object might respond with the
> "characters" that make up the link info. An icon would display
itself. An
> accounting total object would show the "total" as numbers. Any of these
> "character objects" would be able to be linked back to their original
> object - a plain character or a htmlink or an accounting total
object - so
> you can easily create "hyper links" in text.

These shouldn't actually be character objects, but simply formatting
objects (more like words than characters, even better would be lists
of words). I recently did some experiments with the NSText systems,
and found that for many cases the implementation of embedded objects
as special characters is not good enough. One problem is that single
objects may represent multiple words in the output, which would have
to be line-wrapped etc. While it is possible to fake this with
NSText, it is a lot more convoluted than it should be.

Equating "Text" with a series of characters is the fundamental
problem. It is a series of objects, some of which may be represented
words which may actually consist of characters (rough
approximation). Introducing "SuperCharacters" doesn't solve the
fundamental problem of treating text as a sequence of characters.
That doesn't mean that it isn't appropriate in many situations.

> NeXTStep/OpenStep (now Apple) has an amazing Text and Character system.
> There is no doubt that they have done their homework very well.
They have
> an Attributed String object that performs some of the above
functions. Any
> professional text system should have at least the capabilities of the
> OpenStep text system.

Yes, that is definitely a minimum standard. However, there are many
points where it needs to be improved. Another example where Apple's
text system is poor is the handling of very large texts. For these
sorts of situations, it should provide a much more simplified and
less resource intensive configuration.

> In conclusion, an object oriented text system should be based upon an
> object oriented string class that stores characters and other
objects not
> bytes.

No. It should contain various implementations of the "string"
concept that have different tradeoffs where size, generality and
speed is concerned. However, all of these should conform to a
generic string protocol, which includes accessing the contents as
GeneralCharacters.

> The objects stored this general string must conform to the
> "character protocol". A set of "conversion" objects that know how to
> convert between "character byte encodings" and "the general object
> characters" are required and are a very powerful notion.

> This is a valid
> design just as the design you are promoting is a valid design.

The crucial difference, IMHO, is that my proposal includes yours.

> The key
> point is to make the "string" object totally object oriented in it's
> implementation instead of basing it upon a "byte encoding".

This is fine for one particular string object with a specific set of
requirements. It is not OK for others. A "one size fits all"
implementation simply is not appropriate for all situations.

Marcel