Multilingual Support - Right-To-Left-Scripts
Last updated at 8:38 am UTC on 1 April 2004
Title Unicode support
Author Marcel Weiher
Subject: Re: RE: Unicode support
Date: Wed, 22 Sep 1999 23:22:50 +0200
From: Marcel Weiher
> Consider this:
> hebr3 "This is a fine mess" hebr2 hebr1 hebr0
> if you iterate through the tokens, in what order would you expect to
> get tokens?
> if you iterate throught the words, is the order the same?
> I'd argue that in the word case you want:
I'd argue that this latter representation is also the 'natural'
storage organization, with the left-to-right / right-to-left shown at
the top being a purely visual feature and therefore specific to the
layout / display of that particular text / string. Than again, I
know little about hebrew...
see http://www.unicode.org/unicode/reports/tr9/ for bidirectional scripts
The Unicode Standard prescribes a memory representation order known as logical order. When text is presented in horizontal lines, most scripts display characters from left to right. However, there are several scripts (such as Arabic or Hebrew) where the natural ordering of horizontal text in display is from right to left. If all of the text has the same horizontal direction, then the ordering of the display text is unambiguous. However, when bidirectional text (a mixture of left-to-right and right-to-left horizontal text) is present, some ambiguities can arise in determining the ordering of the displayed characters.
This section describes the algorithm used to determine the directionality for bidirectional Unicode text. The algorithm extends the implicit model currently employed by a number of existing implementations and adds explicit format codes for special circumstances. In most cases, there is no need to include additional information with the text to obtain correct display ordering.
However, in the case of bidirectional text, there are circumstances where an implicit bidirectional ordering is not sufficient to produce comprehensible text. To deal with these cases, a minimal set of directional formatting codes is defined to control the ordering of characters when rendered. This allows exact control of the display ordering for legible interchange and also ensures that plain text used for simple items like filenames or labels can always be correctly ordered for display.
The directional formatting codes are used only to influence the display ordering of text. In all other respects they should be ignored–they have no effect on the comparison of text, nor on word breaks, parsing, or numeric analysis.
When working with bidirectional text, the characters are still interpreted in logical order–only the display is affected. The display ordering of bidirectional text depends upon the directional properties of the characters in the text.