Click here to Skip to main content
15,881,803 members
Articles / Productivity Apps and Services / Microsoft Office / Microsoft Word

Developing Multi-Lingual Support as Part of the Development of Rashumon Word Processor

Rate me:
Please Sign up or sign in to vote.
4.96/5 (38 votes)
9 Oct 2013CPOL8 min read 45.9K   29   25   22
When I developed Rashumon, there was no built in support for multi-lingual / bi-directional text and I had to develop it from scratch.
In this article, you will learn about the development of Rashumon and the basic building blocks that had to be developed because of the lack of SDK to provide them.

Introduction

This article brings some information about the development of Rashumon, and the way in which basic building blocks had to be developed due to lack of SDK to provide them.

Image 1

Background

Between 1989 and1994, I developed Rashumon, the first multi-lingual graphic word processor for the Amiga. Rashumon brought some unique features:

  • Multiple Selection of text (selecting non continuous parts of text simultaneously)

  • Table generator

    Image 2

  • Multiple key maps support (up to 5 simultaneously)

  • Search and replace includes color, style and font filters

    Rashumon - Search / Replace

  • Multi-lingual string gadgets to be used for creating and renaming files, drawers, etc.

  • Imports and exports multilingual ASCII files from and to PC and MAC

  • Ultra fast screen updating and scrolling

  • IFF graphics support (import and export).

  • Direct access to each one of the 256 characters per font.

Developing_Rashumon/9.jpg

Using the Code

The code samples used in this article are taken from Rashumon source code, as can be examined by any C++ compiler, even though they were created for Amiga Aztec C compiler.

Points of Interest

Nowadays, we tend to forget some of the complexity which was part of coding 20 years ago, and is today built in as part of any Operating System SDK, among them: bi directional editing, text editing in general, scrolling text and word wrapping.

Developing a multi-lingual graphic word for the Amiga, back in 1989, required in fact, writing parts of what would be today, part of an Operating System, but was missing back then.

Developing Rashumon

The Amiga was and still is a great computer, with great capabilities, especially when it comes to sound and video. However, basic elements such as a File Browse dialog box was missing, not to mention any support for right to left languages.

Today, all operating systems contain the core elements required to support multi-lingual text editing. The text is stored by the order it was typed, and displayed backward, when it comes to a right to left language. This makes it simple and easy to edit and manipulate it, since the storage reflects to the logical order of the text flow. Back then, there was a need to develop such building blocks, and it would have made my word processor too slow, had I displayed the text differently than the way it is stored. Instead, I wanted to develop a word wrapping engine of my own. Word Wrapping is the mechanism that allows breaking lines without breaking words. Unlike the old typewriter, where you get to the end of the line, and sometimes break a word at the middle, word processors are able to move the last typed word to the next line, in case there is not enough room for it in the current line.

That becomes even more complex when you have to deal with proportional fonts, where each character has its own width, and when you allow combining several fonts. All of the above was not part of a high level API, but required calculating the predicted length of a given text, in pixels, taking into calculation each character width based on the character and the font, attribute (bold, italics) and size used, along with the margins selected.

So even if we deal only in one direction (left to right) of editing, it was still complicated to develop from scratch.

To being with, I wrote the routine to calculate a length of a given line:

C++
int LLen=(mode==GRAPHICS)?ln->Len:(ln->CharLen=strlen(ln->Txt))*8; 

As you can see, there is a simple scenario where "mode" isn't "GRAPHICS", then the length is calculated based on the number of characters multiplied by 8 (which is the length of each character when mono spaced font is used).

When it comes to editing bi directional text using proportional and multiple fonts, it is more complex even to insert a single character:

C++
// This is a routine for adding a single character, taken
from Rashumon source code
 
static put_char(wrap,chr)
BOOL wrap;
UBYTE chr;
{
    UBYTE c[2];
    BOOL update=FALSE;
 
    c[1]='\0';
    c[0]=chr;
    if(ENGLISH_H)
    // Left to right text
    {
        if(chr>='a' && chr<='z' && map[Header->format >> 11]<2) chr=ucase(chr);
        if(Header->Insert || !HCL->Txt[CR])
        {
            if(!wrap && HCL->Len+font_width(HCL,CR)>HCL->MaxLen) return();
            char_insert(HCL,c[0],CR);
            HCL->CharLen++;
            CR++;
            HCL->Len+=font_width(HCL,CR-1);   

// Here, we add the additional size to the overall line 
//size in pixels
            HCL->CursLen+=font_width(HCL,CR-1);
        }
        /* OVERWRITE IN ENGLISH */
        {
// Now, we treat Overwrite mode
        HCL->Txt[CR]=c[0];
        HCL->Format[CR]=Header->format;
        if(c[0]==9) 
        {
            SetFlag(HCL->Format[CR],TAB);
            HCL->Txt[CR]=SPACE_CHR;
            SetFlag(MD,TABS);
        }
        CR++;
        calc_charlen(HCL);
        calc_all(HCL);
        Clear(HCL);
    }
}
else
// Hebrew ( or Right to Left) mode
{
    if(!Header->Insert && CR)
    {
        CR--;
        HCL->CursLen-=font_width(HCL,CR+1);
        HCL->Txt[CR]=c[0];
        HCL->Format[CR]=Header->format;
        if(c[0]==9) 
        {
            SetFlag(HCL->Format[CR],TAB);
            HCL->Txt[CR]=SPACE_CHR;
            SetFlag(MD,TABS);
        }
        calc_all(HCL);
        Clear(HCL);
    }
    else
    {
        if(!wrap && HCL->Len+font_width(HCL,CR)>HCL->MaxLen) return();
        char_insert(HCL,c[0],CR);
        HCL->Len+=font_width(HCL,CR);
        HCL->CharLen++;
    }
}
if(HCL->Mode & TABS) calc_all(HCL);
if(c[0]!=SPACE_CHR && fonts[Header->format >> 11]->tf_YSize>LH(HCL)) 
{
    HCL->LineHeight=Header->LineHeight=fonts[Header->format >> 11]->tf_YSize;
    HCL->BaseLine=Header->BaseLine=fonts[Header->format >> 11]->
         tf_YSize-fonts[Header->format >> 11]->tf_Baseline;
    update=TRUE;
}
else
if(c[0]==SPACE_CHR && HCL->prev && !(HCL->Mode & PAR) && wrap)
{
    WrapLine(HCL->prev,!(update));
}
if(HCL->Len<=HCL->MaxLen && !(update))
{
    showtext(HCL);
    SetCursor();
}
else
if(wrap)
    FixLine(HCL,!update);
 
if(update)
    update_lh(HCL,TRUE);
  
} 

The next step was to perform word wrap to bi-directional lines. As I have explained, the lines were displayed as they were stored in memory. The text "abcאבג" was stored exactly like it looks. Rashumon used double byte characters, meaning that each character was stored using 2 bytes. That was before UNICODE was invented, so the first byte was sufficient to store any character in any language supported. At that time, ASCII characters were in two forms, one form used the values from 0 to 127 and the expanded form uses values 0 to 255. I used the expanded form, and had to decide where to place the right to left languages.

There was no standard for right to left languages. IBM used places 128 to 154, but I found it problematic and chose the places starting of 224, which seems to be the right choice today, as it is identical to how right to left languages are represented today using Double Byte encoding.So if I open a floppy disk image from 1989 (.ADF file), all Rashumon Hebrew documents appear in the correct encoding.

s for the 2nd byte, it was used to store the character color (3 types, meaning up to 8 colors), font attributes (Bold, Italics and underline, or any combination between the 3), language (right to left or left to right) and the font, by pointing to an index of this font within a local list created from the entire list of fonts used per document.

C++
/* Line structure */
#define COLOR_BIT_1 1           /* 1  */
#define COLOR_BIT_2 2           /* 2  */
#define COLOR_BIT_3 4           /* 3  */
#define UNDL 8                  /* 4  */
#define BOLD 16                 /* 5  */
#define ITAL 32                 /* 6  */
#define SELECT 64               /* 7  */
#define LANG 128                /* 8  */
#define TAB 256                 /* 9  */
#define UNUSED_1 1024           /* 10 */
#define UNUSED_2 2048           /* 11 */
#define FONT_BIT_1 4096         /* 12 */
#define FONT_BIT_2 8192         /* 13 */
#define FONT_BIT_3 16384        /* 14 */
#define FONT_BIT_4 32768        /* 15 */
#define FONT_BIT_5 65536        /* 16 */ 

Key Mapping and Encoding

Developing_Rashumon/rashumon-screen.jpg

Key Mapping was used as an array of all characters per the places starting from "1" and to the end of the array.

Here is another part taken from Rashumon source code, where the keymaps are defined:

C++
/* HEBREW AND ENGLISH MAPS */
 unsigned char regmap[] = 
       ";1234567890-=\\0/'-˜€ˆ...\"[] 123(tm)ƒ‚‹'‰‡ŒŠ",  456 †'„Ž-š•. .789 ";
 
 unsigned char engmap[] = 
       "`1234567890-=\\0qwertyuiop[] 123asdfghjkl;'  456zxcvbnm,./ .789 ";
 
 unsigned char shiftmap[] = 
       "~!@#$%^&*()_+|0QWERTYUIOP{} 123ASDFGHJKL:\"  456 ZXCVBNM<>?.789 ";
 
 unsigned char shiftrus[] = 
       "~!@#$%^&*()_+|0°¶₪±³¸´¨(r)¯{} 123 ²£¥¦§(c)׫:\"  456 ¹·¢µ¡­¬<>? .789 ";
 
 unsigned char rusmap[] = 
       "`1234567890-=\\0׀ײִׁ׃״װָ־ֿ[] 123ְֳֵֶַֹֺֻׂ;'  456 ׳ֲױֱּֽ,./ .789 ";  

As you can see, "regmap" is the Hebrew encoding, "engmap" is for the Latin text, "shiftmap" is for the characters typed with SHIFT button, and there was also a keymap for Russian (and later on, one for Arabic as well, thanks to John Hajjer, from Chicago, who spent a lot of time to help me release an Arabic version).

Switching between the two directions was made using a unique ruler with two versions: left to right and right to left:

(Thanks to Shimon Wiessman for the graphic design.)

Image 6

Image 7

Pressing the arrow, changed the direction of editing.

Scrolling Text

Even obvious things such as scrolling had to be invented back then. That includes determining how many lines of text to display, based on the current window size (Amiga windows had the ability to be resized by the end user, as well as maximized and minimized), displaying a scroll bar, and calculating the size of the scroll bar's gauge which should be proportional to the possible movement and the available scroll.

C++
scroll(ln,lines)
struct Line *ln;
int
lines;
{ 
       register SHORT distance,
                     top=TOP,
                     bot=BOT;
#if
DEBUG
       printf("BEFORE: top=%ld (%ld <> TOP=%ld) ",
                     Header->top->num,
                     Header->top->y,TOP+Header->shift);
       printf("bottom=%ld (%ld <> BOT=%ld)\n",
                     Header->bottom->num,
                     Header->bottom->y+LH(Header->bottom),BOT+Header->shift);
#endif
       if(lines>0)
       {
              distance=Header->bottom->next->y+LH(Header->bottom->next)-Header->
                               shift-Header->Height;
              Header->shift+=distance;
              while(Header->top->y<Header->shift) 
                     Header->top=Header->top->next;
              Header->bottom=Header->bottom->next;
       }
       else
       {
              distance=-(Header->shift-Header->top->prev->y);
              Header->shift+=distance;
              Header->top=Header->top->prev;
              while(Header->bottom->y+LH(Header->bottom)>Header->Height+Header->shift)
                     Header->bottom=Header->bottom->prev;
       }
       if(distance<100)
              ScrollRaster(rp,0,distance,0,TOP,640,BOT);
       else
              calc_top_bottom(TRUE,0,0);
#if
DEBUG
       printf("AFTER: top=%ld (%ld <> TOP=%ld) ",
                     Header->top->num,
                     Header->top->y,TOP+Header->shift);
       printf("bottom=%ld (%ld <> BOT=%ld)\n",
                     Header->bottom->num,
                     Header->bottom->y+LH(Header->bottom),BOT+Header->shift);
#endif
} 

Word Wrapping Bi-Directional Text

But now, let's go back to the bi-directional text word wrap. Basically, the algorithm, developed by me along with Tamer Even –Zohar, and her husband Nimrod, was based on examining a given line and if it is longer than the size between the two margins (calculating the line width in pixels, taking into calculation each character, based on its independent attributes), we should remove the last word from it, and then again check the new length, and so on, until the line is within the width of the margins.

The first question to ask is where is the "last" word? If it is a right to left paragraph, the last word will appear first, in the buffer.

In such case, I used the following function, which in fact measured the size (in pixels) of the first word in a given buffer. The following routines are based on mono space font, which is complicated enough…

C++
/* returns the len of the first word in s */
#define BLNK(c)      ((c)==' '
|| (c)=='\n')
first_wordlen(s,margin,blnks1,blnks2)
char
*s; 
int
margin, *blnks1, *blnks2;
{
    register
    int i, j;
    for (i=margin; BLNK(s[i]) && s[i]; i++);
             *blnks1 = i;
       for (; !(BLNK(s[i])) && s[i]; i++);
       for (j=i; BLNK(s[j]) && s[i]; j++);
       *blnks2 = j-i;
       return(i);
} 

If the line is a left to right one, a different function was used:

C++
last_wordlen(s,blnks1,blnks2,maxlen)
char
*s;
int
*blnks1, *blnks2, maxlen;
{
    register int i, j;
        if (!strlen(s)) return(0);
    for (i=strlen(s)-1; BLNK(s[i]) && i; i--);
    if (i==0) return(0); 
    *blnks1 = (strlen(s)-(i+1));
    for (i=min(maxlen,strlen(s)-1); BLNK(s[i]) && i; i--);
    for (; !(BLNK(s[i])) && i; i--);
    for (j=i; BLNK(s[j]) && j; j--);
     i++; 
       *blnks2 = i - j;
       return(strlen(s)- i);
} 

Of course, we do not only remove the last word from a line, but also place back the first word of the next line, when there is space available (for example, if the first word in the current line is deleted, and space becomes available), so another building block would be placing the next word (from the beginning of the next line) back to the end of the current line.

C++
/* copies first word of length len & trailing blanks
blnks from s2 to s1 */
copy_first(s1,s2,len,blnks)
char
*s1,*s2;
int
len,blnks;
{ 
       append(s2,s1,strlen(s1)+len+(blnks? 1 : 0));
       delete1(s2,0,len+(blnks ?1 : 0));
} 

In Rashumon, the paragraph direction of text was automatically calculated by examining the encoding of each character in each line, and determining which direction is dominant. While brainstorming with Tamar Even Zohar and her husband Nimrod, we came to know that even space " " character can have a direction, and we had to decide if we wish to have Hebrew space character in addition to the Latin one. Well, this requirement became a "must", as it is needed to word wrap paragraphs with multiple languages combined.

For example:

"This is an example of a paragraph with opposed direction languages. זוהי דוגמה לפיסקה עם שילוב של שתי שפות עם כיוונים מנוגדים"

The following clip demonstrates how bi-directional text is edited by Rashumon.

Now, if you make change the margins, which word will "jump" to the next line, or "jump" back to the current line? The only way to determine that, is by knowing the direction of each character (either right to left or left to right), including special characters such as tabs, spaces, commas, etc.

Rashumon can still be downloaded from Aminet at this link.

An Article about Rashumon (UK)

Further Reading

History

  • 9th October, 2013: Initial version

Michael Haephrati , CodeProject MVP 2013

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
CEO Secured Globe, Inc.
United States United States
Michael Haephrati is a music composer, an inventor and an expert specializes in software development and information security, who has built a unique perspective which combines technology and the end user experience. He is the author of a the book Learning C++ , which teaches C++ 20, and was published in August 2022.

He is the CEO of Secured Globe, Inc., and also active at Stack Overflow.

Read our Corporate blog or read my Personal blog.





Comments and Discussions

 
GeneralMy vote of 5 Pin
JohnBergen24-Jul-12 23:43
JohnBergen24-Jul-12 23:43 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.