Click here to Skip to main content
Click here to Skip to main content

Developing multi-lingual support as part of the development of Rashumon word processor

, 9 Oct 2013
Rate this:
Please Sign up or sign in to vote.
When I have devleoped Rashumon, there was no built in support for multi-lingual / bi-directional text and I had to develop such from scratch

Download Rashumon DTP version

Introduction

This article brings some information about the development of Rashumon, and the way, basic building blocks had to be developed due to lack of SDK to provide them.

Background

During 1989 to 1994 I have developed Rashumon, the first multi-lingual graphic word processor for the Amiga. Rashumon brought some unique features:

  • Multiple Selection of text (selecting non continious parts of text simmultaneously)

  • Table generator

  • Multiple key maps support (up to 5 simultaneously)

  • Search and replace includes color, style and font filters

    Rashumon - Search / Replace

  • Multi-lingual string gadgets to be used for creating and renaming files, drawers, etc.

  • Imports and exports multilingual ASCII files from and to PC and MAC

  • Ultra fast screen updating and scrolling

  • IFF graphics support (import and export).

  • Direct access to each one of the 256 characters per each font.

Developing_Rashumon/9.jpg

Using the code

The code samples used in this article are taken from Rashumon source code, as can be examined by any C++ compiler, even though they were created for Amiga Aztec C compiler.

Points of Interest

Now days we tend to forget some of the complexeity which was part of coding 20 years ago, and today is built in as part of any Operating System SDK, among them: bi directional editing, text editing in general, scorlling text and word wrapping.

Developing a multi-lingual graphic word for the Amiga, back in 1989, required in fact, writing parts of what would be today, part of an Operating System, but was missing back then.

Developing Rashumon

The Amiga was and still is a great computer, with great capabilities, especially when it comes to sound and video. However, basic elements such as a File Browse dialog box was missing, not to mention any support for right to left languages.

Today, all operating systems contain the core elements required to support multi-lingual text editing. The text is stored by the order it was typed, and displayed backward, when it comes to a right to left language. This makes it simple and easy to edit and manipulate it, since the storage reflects to the logical order of the text flow. Back then, there was a need to develop such building block, and it was have made my word processor too slow, had I displayed the text differently than the way it is stored. Instead, I wanted to develop a word wrapping engine of my own. Word Wrapping is the mechanism that allows breaking lines without breaking words. Unlike the old type writer, where you get to the end of the line, and sometime break a word at the middle, word processors are able to move the last typed word to the next line, in case there is not enough room for it in the current line.

That becomes even more complex when you have to deal with proportional fonts, where each character has its own width, and when you allow combining several fonts. All of the above was not part of a high level API, but required calculating the predicted length of a given text, in pixels, taking into calculation each character width based on the character and the font, attribute (bold, italics) and size used, along with the margins selected.

So even if we deal only one direction (left to right) of editing, it was still complicated to develop from scratch.

To being with, I wrote the routine to calculate a length of a given line:

int LLen=(mode==GRAPHICS)?ln->Len:(ln->CharLen=strlen(ln->Txt))*8; 

As you can see, there is a simple scenario where "mode" isn't "GRAPHICS", then the length is calculated based on the number of characters multiplied by 8 (which is the length of each character when mono spaced font is used).

When it comes to editing bi directional text using proportional and multiple fonts, it is more complex even to insert a single character:

// This is a rountine for adding a single character, taken
from Rashumon source code
 
static put_char(wrap,chr)
BOOL wrap;
UBYTE chr;
{
    UBYTE c[2];
    BOOL update=FALSE;
 
    c[1]='\0';
    c[0]=chr;
    if(ENGLISH_H)
    // Left to right text
    {
        if(chr>='a' && chr<='z' && map[Header->format >> 11]<2) chr=ucase(chr);
        if(Header->Insert || !HCL->Txt[CR])
        {
            if(!wrap && HCL->Len+font_width(HCL,CR)>HCL->MaxLen) return();
            char_insert(HCL,c[0],CR);
            HCL->CharLen++;
            CR++;
            HCL->Len+=font_width(HCL,CR-1);   

// Here we add the additional size to the overall line 
//size in pixels
            HCL->CursLen+=font_width(HCL,CR-1);
        }
        /* OVERWRITE IN ENGLISH */
        {
// Now we treat Overwrite mode
        HCL->Txt[CR]=c[0];
        HCL->Format[CR]=Header->format;
        if(c[0]==9) 
        {
            SetFlag(HCL->Format[CR],TAB);
            HCL->Txt[CR]=SPACE_CHR;
            SetFlag(MD,TABS);
        }
        CR++;
        calc_charlen(HCL);
        calc_all(HCL);
        Clear(HCL);
    }
}
else
// Hebrew ( or Right to Left) mode
{
    if(!Header->Insert && CR)
    {
        CR--;
        HCL->CursLen-=font_width(HCL,CR+1);
        HCL->Txt[CR]=c[0];
        HCL->Format[CR]=Header->format;
        if(c[0]==9) 
        {
            SetFlag(HCL->Format[CR],TAB);
            HCL->Txt[CR]=SPACE_CHR;
            SetFlag(MD,TABS);
        }
        calc_all(HCL);
        Clear(HCL);
    }
    else
    {
        if(!wrap && HCL->Len+font_width(HCL,CR)>HCL->MaxLen) return();
        char_insert(HCL,c[0],CR);
        HCL->Len+=font_width(HCL,CR);
        HCL->CharLen++;
    }
}
if(HCL->Mode & TABS) calc_all(HCL);
if(c[0]!=SPACE_CHR && fonts[Header->format >> 11]->tf_YSize>LH(HCL)) 
{
    HCL->LineHeight=Header->LineHeight=fonts[Header->format >> 11]->tf_YSize;
    HCL->BaseLine=Header->BaseLine=fonts[Header->format >> 11]->tf_YSize-fonts[Header->format >> 11]->tf_Baseline;
    update=TRUE;
}
else
if(c[0]==SPACE_CHR && HCL->prev && !(HCL->Mode & PAR) && wrap)
{
    WrapLine(HCL->prev,!(update));
}
if(HCL->Len<=HCL->MaxLen && !(update))
{
    showtext(HCL);
    SetCursor();
}
else
if(wrap)
    FixLine(HCL,!update);
 
if(update)
    update_lh(HCL,TRUE);
  
} 

The next step was to perform word wrap to bi-directional lines. As I have explained, the lines were displayed as they were stored in memory. The text "abcאבג" was stored exactly like it looks. Rashumon used double byte characters, meaning that each character was stored using 2 bytes. That was before UNICODE was invented, so the first byte was sufficient to store any character in any language supported. At that time, ASCII characters were in two forms, one form used the values from 0 to 127 and the expanded form uses values 0 to 255. I used the expanded form, and had to decide where to place the right to left languages.

There was no standard for right to left languages. IBM used places 128 to 154, but I found it problematic and chose the places starting of 224, which seems to be the right choice today, as it is identical to how right to left languages are represented today using Double Byte encoding.So if I open a floppy disk image from 1989 (.ADF file), all Rashumon Hebrew documents appear in the correct encoding.

As for the 2nd byte, it was used to store the character color (3 types, meaning up to 8 colors), font attributes (Bold, Italics and underline, or any combination between the 3), language (right to left or left to right) and the font, by pointing to an index of this font within a local list created from the entire list of fonts used per each document.

/* Line structure */
#define COLOR_BIT_1 1           /* 1  */
#define COLOR_BIT_2 2           /* 2  */
#define COLOR_BIT_3 4           /* 3  */
#define UNDL 8                  /* 4  */
#define BOLD 16                 /* 5  */
#define ITAL 32                 /* 6  */
#define SELECT 64               /* 7  */
#define LANG 128                /* 8  */
#define TAB 256                 /* 9  */
#define UNUSED_1 1024           /* 10 */
#define UNUSED_2 2048           /* 11 */
#define FONT_BIT_1 4096         /* 12 */
#define FONT_BIT_2 8192         /* 13 */
#define FONT_BIT_3 16384        /* 14 */
#define FONT_BIT_4 32768        /* 15 */
#define FONT_BIT_5 65536        /* 16 */ 

Key Mapping and Encoding

Developing_Rashumon/rashumon-screen.jpg

Key Mapping was used as an array of all characters per the places starting from "1" and to the end of the array.

Here is another part taken from Rashumon source code, where the keymaps are defined:

/* HEBREW AND ENGLISH MAPS */
 unsigned char regmap[] = 
       ";1234567890-=\\0/'-˜€ˆ...\"[] 123(tm)ƒ‚‹'‰‡ŒŠ",  456 †'„Ž-š•. .789 ";
 
 unsigned char engmap[] = 
       "`1234567890-=\\0qwertyuiop[] 123asdfghjkl;'  456zxcvbnm,./ .789 ";
 
 unsigned char shiftmap[] = 
       "~!@#$%^&*()_+|0QWERTYUIOP{} 123ASDFGHJKL:\"  456 ZXCVBNM<>?.789 ";
 
 unsigned char shiftrus[] = 
       "~!@#$%^&*()_+|0°¶₪±³¸´¨(r)¯{} 123 ²£¥¦§(c)׫:\"  456 ¹·¢µ¡­¬<>? .789 ";
 
 unsigned char rusmap[] = 
       "`1234567890-=\\0׀ײִׁ׃״װָ־ֿ[] 123ְֳֵֶַֹֺֻׂ;'  456 ׳ֲױֱּֽ,./ .789 ";
  

As you can see, "regmap" is the Hebrew encoding, "engmap" is for the Latin text, "shiftmap" is for the characters typed with SHIFT button, and there was also a keymap for Russian (and later on, one for Arabic as well, thanks to John Hajjer, from Chicago, who spent a lot of time to help me release an Arabic version).

Switching between the two directions was made using a unique ruller with two versions: left to right and right to left:

(Thanks to Shimon Wiessman for the graphic design) 

Pressing the arrow, changed the direction of editing.

Scrolling Text

Even obvious things such as scrolling had to be invented back then. That includes determining how many lines of text to display, based on the current window size (Amiga windows had the ability to be resized by the end user, as well as maximized and minimized), displaying a scroll bar, and calculating the size of the scroll bar's gauge which should be proportional to the possible movement and the available scroll.

scroll(ln,lines)
struct Line *ln;
int
lines;
{ 
       register SHORT distance,
                     top=TOP,
                     bot=BOT;
#if
DEBUG
       printf("BEFORE: top=%ld (%ld <> TOP=%ld) ",
                     Header->top->num,
                     Header->top->y,TOP+Header->shift);
       printf("bottom=%ld (%ld <> BOT=%ld)\n",
                     Header->bottom->num,
                     Header->bottom->y+LH(Header->bottom),BOT+Header->shift);
#endif
       if(lines>0)
       {
              distance=Header->bottom->next->y+LH(Header->bottom->next)-Header->shift-Header->Height;
              Header->shift+=distance;
              while(Header->top->y<Header->shift) 
                     Header->top=Header->top->next;
              Header->bottom=Header->bottom->next;
       }
       else
       {
              distance=-(Header->shift-Header->top->prev->y);
              Header->shift+=distance;
              Header->top=Header->top->prev;
              while(Header->bottom->y+LH(Header->bottom)>Header->Height+Header->shift)
                     Header->bottom=Header->bottom->prev;
       }
       if(distance<100)
              ScrollRaster(rp,0,distance,0,TOP,640,BOT);
       else
              calc_top_bottom(TRUE,0,0);
#if
DEBUG
       printf("AFTER: top=%ld (%ld <> TOP=%ld) ",
                     Header->top->num,
                     Header->top->y,TOP+Header->shift);
       printf("bottom=%ld (%ld <> BOT=%ld)\n",
                     Header->bottom->num,
                     Header->bottom->y+LH(Header->bottom),BOT+Header->shift);
#endif
} 

Word Wrapping Bi-Directional Text

But now let's go back to the bi-directional text word wrap. Basically, the algorithm, developed by me along with Tamer Even –Zohar, and her husband Nimrod, was based on examining a given line and if it is longer than the size between the two margins (calculating the line width in pixels, taking into calculation each character, based on it's independent attributes), we should remove the last word from it, and then check again the new length, and so on, until the line is within the width of the margins.

The first question to ask is where is the "last" word? If it is a right to left paragraph, the last word will appear first, in the buffer.

In such case, I used the following function, which in fact measured the size (in pixels) of the first word in a given buffer. The following routines are based on mono space font, which is complicated enough…

/* returns the len of the first word in s */
#define BLNK(c)      ((c)==' '
|| (c)=='\n')
first_wordlen(s,margin,blnks1,blnks2)
char
*s; 
int
margin, *blnks1, *blnks2;
{
    register
    int i, j;
    for (i=margin; BLNK(s[i]) && s[i]; i++);
             *blnks1 = i;
       for (; !(BLNK(s[i])) && s[i]; i++);
       for (j=i; BLNK(s[j]) && s[i]; j++);
       *blnks2 = j-i;
       return(i);
} 

If the line is a left to right one, a different function was used:

last_wordlen(s,blnks1,blnks2,maxlen)
char
*s;
int
*blnks1, *blnks2, maxlen;
{
    register int i, j;
        if (!strlen(s)) return(0);
    for (i=strlen(s)-1; BLNK(s[i]) && i; i--);
    if (i==0) return(0); 
    *blnks1 = (strlen(s)-(i+1));
    for (i=min(maxlen,strlen(s)-1); BLNK(s[i]) && i; i--);
    for (; !(BLNK(s[i])) && i; i--);
    for (j=i; BLNK(s[j]) && j; j--);
     i++; 
       *blnks2 = i - j;
       return(strlen(s)- i);
} 

Of course we do not only remove the last word from a line, but also place back the first word of the next line, when there is space available (for example, if the first word in the current line is deleted, and space becomes available), so another building block would be placing the next word (from the beginning of the next line) back to the end of the current line.

/* copies first word of length len & trailing blanks
blnks fron s2 to s1 */
copy_first(s1,s2,len,blnks)
char
*s1,*s2;
int
len,blnks;
{ 
       append(s2,s1,strlen(s1)+len+(blnks? 1 : 0));
       delete1(s2,0,len+(blnks ?1 : 0));
} 

In Rashumon, the paragraph direction of text was automatically calculated by examining the encoding of each character in each line, and determining which direction is dominant. While brainstorming with Tamar Even Zohar and her husband Nimrod, we came to know that even space " " character can have a direction, and we had to decide if we wish to have Hebrew space character in addition to the Latin one. Well, this requirement became a "must", as it is needed to word wrap paragraphs with multiple languages combined.

For example:

"This is an example of a paragraph with opposed direction languages. זוהי דוגמה לפיסקה עם שילוב של שתי שפות עם כיוונים מנוגדים"

The following clip demonstrate how bi-directional text is edited by Rashumon.

http://youtu.be/QNsqbp7mNOA

Now, if you make change the margins, which word will "jump" to the next line, or "jump" back to the current line? The only way to determine that, is by knowing the direction of each character (either right to left or left to right), including special characters such as tabs, spaces, commas, etc.

Rashumon can still be downloaded from Aminet at this link.

An Article about Rashumon (UK)

Further reading

Rashumon web site

HarmonySoft web site

My Blog (hebrew)

My Blog (English)

My Coding Blog

 Michael Haephrati CodeProject MVP 2013    

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Michael N. Haephrati
CEO
United States United States
Michael Haephrati, is an entrepreneur, inventor and a musician. Haephrati worked on many ventures starting from HarmonySoft, designing Rashumon, the first Graphical Multi-lingual word processor for Amiga computer. During 1995-1996 he worked as a Contractor with Apple at Cupertino. Worked at a research institute made the fist steps developing the credit scoring field in Israel. He founded Target Scoring and developed a credit scoring system named ThiS, based on geographical statistical data, participating VISA CAL, Isracard, Bank Leumi and Bank Discount (Target Scoring, being the VP Business Development of a large Israeli institute).

During 2000, he founded Target Eye, and developed the first remote PC surveillance and monitoring system, named Target Eye.

Other ventures included: Data Cleansing (as part of the DataTune system which was implemented in many organizations.


Follow on   Twitter   Google+   LinkedIn

Comments and Discussions

 
GeneralMy vote of 5 PinmemberEmma20123217-Sep-12 6:48 
GeneralMy vote of 5 PinmemberJohnBergen24-Jul-12 23:43 
GeneralMy vote of 5 PinmemberJeff Kibling31-Jan-12 3:30 
QuestionNice article PinmemberGanesanSenthilvel22-Jan-12 0:51 
QuestionMy vote of 5 PinmemberPhotoPinka3-Jan-12 19:49 
QuestionProblem with the download link PinmemberMichael Haephrati28-Nov-11 7:29 
AnswerRe: Problem with the download link PinmvpRichard MacCutchan29-Nov-11 0:01 
GeneralRe: Problem with the download link PinmemberMichael Haephrati30-Nov-11 4:56 
GeneralRe: Problem with the download link PinmvpRichard MacCutchan30-Nov-11 4:59 
GeneralMy vote of 5 Pinmembertd12327-Nov-11 6:47 
SuggestionImages PinmentorDaveAuld27-Nov-11 5:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140721.1 | Last Updated 9 Oct 2013
Article Copyright 2011 by Michael N. Haephrati
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid