Recently, I have been involved in localization of software applications for global markets. Although software localization and translation is usually (and hopefully) less complex and less expensive than the original development of the application, it is still a complex issue, and it can be difficult knowing how to get started.
In this article, I am summarizing some of the information I would have liked to have immediately available when I first considered localizing applications. This article is primarily targeted to programmers considering translation and localization of applications. It is intended to help you make the decision to proceed, and to point out some of the unexpected pitfalls you may encounter. It is by no means all the information you need on software translations and localization.
The information comes from research and my own experiences. As my experiences are incomplete, and limited to Latin languages, this article is limited to these areas as well. As I gain more experience with more complex translations, I hope to post updated articles on this subject. Of course, other programmers are welcome to add their comments and insights. I will apologize in advance for anything I say that is incorrect. None of this information is guaranteed accurate. After all, it is free!
It should be noted that there is a difference between translation and localization.
Translation is translating the application text to a different language, e.g. English to French.
Localization is customizing an application for a specific country or location, e.g. American English to U.K. English.
Of course, if you are translating software from English to French, you are probably localizing it as well.
Finally, I am using a writing style where the original language is assumed to be English. I hope this won't offend the rest of the planet... it is just easier!
When should you do the translation?
If at all possible, you should thoroughly complete and test the English version before beginning translation. It is much easier. Of course, the English version should have all strings in resources, and all date/time values should be formatted using system settings. This helps to eliminate bugs that could appear when moving strings into resources.
However, there may be good business reasons to develop English and translated versions simultaneously. Just remember that last minute changes to the user-interface will be even more difficult to accommodate than if you were developing in English alone. In this case, use of a translation toolkit is even more helpful.
How much translation do you want to do?
There are different degrees of software translation. Before beginning the translation, you should decide exactly how much of the application will be translated. For example, you may just want to translate the interface (menus, dialogs and resource strings), but not the documentation. Is it OK to have some obscure error messages in English? Maybe you want to translate on-line docs, but not printed docs. What about the installation program? All these components must be considered. You may even want to ship one version of the program with all resources in DLL in different languages, allowing the user to select the language for the interface.
Of course, if you only translate the interface, you end up with documentation that says "Select Import from the File menu", and the user has no "File" menu and no "Import" option. I think this is one reason that Microsoft has gone to "What's This" help, even though user's never seem to use it or know that it is there.
Preparing Your Application for Translation
Writing an application that you know will be translated, is not as easy as writing one for a single language. String issues are the most complex, but there are also dialog size issues, icons and accelerator keys.
You probably know that one of the main reasons for using string resources is to simplify translation of the product, so make use of this feature. All strings displayed to the user, including formatting strings, should be retrieved from the resource string table.
Here are some other string handling tips:
- Try to use consistent error messages. For example, the following error messages all say the same thing in principle, but could all be worded exactly the same to be more efficient:
"The file could not be opened."
"Failed to open file."
"Failed to open the selected file."
Even if you have to generalize an error message somewhat, it may be worth it to simplify translation. Most programmers write vague and cryptic error messages anyway (but that's a whole other topic).
- Try to minimize the number of
%s formatting items in each string. It is possible that a string with two
%s strings will require the strings to be displayed in reverse order when translated. Of course, this is very difficult to handle within the code. Minimizing these complex strings reduces your chances of this occurrence.
- Try to make strings that make sense on their own. Your translator is not always going to understand the context in which each string is used. Short and vague strings are sure to generate a call from your translator asking for clarification about what it is trying to say.
Translating English to another language can increase text size from 30% to as much as 100%, based on my experience. You will need to allow for extra space in your dialog boxes. Of course, you will be able to resize your dialogs before rebuilding the translated application, but it will be much easier if most of the dialogs and controls have sufficient size to begin with. You will have to balance funny looking English dialogs, with the amount of work involved in resizing translated dialogs.
Localizing MFC Print Preview
When localizing resources, don't forget about the print preview dialog bar and MFC print dialogs. Microsoft provides these resources in a variety of languages. Look in the VisualC\Mfc\Src directory. There is a sub-directory for each language. French resources are in l.fra.
These print preview resources are
#include-ed into your RC file with the following code:
3 TEXTINCLUDE DISCARDABLE
"#if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_FRA)\r\n"
"LANGUAGE 12, 1\r\n"
// non-Microsoft Visual C++ edited resources\r\n"
// Standard components\r\n"
// printing/print preview resources\r\n"
Open your RC file as a text file to find this code section. In the example shown above, I am including the French resources for standard components and print preview. This is indicated in the following lines:
"#include ""l.fra\\afxres.rc"" // Standard components\r\n"
"#include ""l.fra\\afxprint.rc"" // printing/print preview
Simply edit your RC file as a text file and include the appropriate resource from the appropriate subdirectory. Rebuild the application.
Localizing Property Pages
Property pages and wizard sheets tend to have buttons that say things like OK, Apply, Cancel, Finish, Next.
The Good News: Text on these buttons appear in the language of the operating system, so you don't have to translate them at all.
The Bad News: If your application is in a different language than the operating system, property pages and wizards will appear in mixed languages.
In one of my applications, I chose to derive a class from
CPropertySheet and use it for all of my wizards and property sheets. Then I set each button caption to the correct text for the language of the application, not of the operating system. It helps to give a more consistent look when the OS and app language are not the same.
Sorting and Strings
Here is a question for you... If Chinese has no alphabet, how do you sort strings?
Well, I have asked several Chinese, and never gotten a good answer to this question, but I do know this... sorting strings is the bane of many software translation projects, and it is not just a problem with Chinese. Almost any western language makes use of accented characters, characters with hats, umlauts, or funny German double s's. Most of these characters are in the ASCII 128 to 255 range for normal English charsets. Consequently, words with these characters may not get sorted correctly.
The solution is to make sure that all of your string comparison routines make use of the current user's system locale information. The documentation is a little weird, but I believe that
CString does not account for the locale setting. You must use the
strcoll and related functions. You had best test any sorting algorithms for success with each locale.
Handling Date and Time
Even if your application is only in English, you should display date and time values using the Windows system settings for that user. Just because the user has an English application, does not mean they are on English Windows, and there is nothing worse than trying to work with dates and times that are not formatted the way you like.
I try to use
COleDateTimeSpan whenever possible. Formatting
COleDateTime with system settings is very easy.
dtToday = COleDateTime::GetCurrentTime();
dtToday.Format() // Displays date and time in format
//specified in users Windows settings
dtToday.Format(VAR_DATEVALUEONLY) // Displays date only in format
//specified in users Windows settings
dtToday.Format(VAR_TIMEVALUEONLY) // Displays time only in format
//specified in users Windows settings
Keep in mind that Windows allows the user to set system settings for both the long date and the short date. Here is a function to convert a
COleDateTime to a string in the long date, set by the user's system settings.
CString sfxFormatLongDate(COleDateTime dt)
st.wYear = (WORD)dt.GetYear();
st.wMonth = (WORD)dt.GetMonth();
st.wDayOfWeek = 0;
st.wDay = (WORD)dt.GetDay();
st.wHour = 0;
st.wMinute = 0;
st.wSecond = 0;
st.wMilliseconds = 0;
// locale for which date is to be formatted
// flags specifying function options
&st, // date to be formatted
NULL, // date format string
sz, // buffer for storing formatted string
254 // size of buffer
Bad error messages are one of my biggest complaints about software, even when written in English. But who wants to translate a lot of cryptic error messages that the user will probably not see anyway? Well, you can receive many system error messages in the language of the operating system by using the
FormatMessage function, passing to it the result of a call to
GetLastError. Refer to the SDK on these functions for further information.
How and Why to Build a Unicode App
What are Unicode and MBCS?
Unicode and MBCS are character sets that allow for more than 255 characters, for languages such as Chinese and Japanese. In Unicode, every character is 2 bytes. In MBCS, some characters are one byte, and some are two, and I think some may be more. This makes it very difficult to determine how many bytes are in a string. In short, you want to use Unicode, not MBCS!
When using Unicode, the big thing to be careful of is making sure that you pass the number of characters in a string to functions that require the number of characters, and the size in bytes of a string to functions requiring the size in bytes. With ANSI, these values are the same, but not with Unicode.
This function requires the size in bytes:
LPTSTR lpBuffer, // buffer for Windows directory
UINT uSize // size of directory buffer
This function requires the number of characters:
TCHAR *_tcsncpy( TCHAR *strDest, const TCHAR *strSource, size_t count );
How to Build A Unicode App
Building a Unicode app is not really difficult. You will have to install the Unicode MFC libraries with Visual C++. I don't believe these are installed by default.
You need to read the article "Unicode Programming Summary" on MSDN. Search for the term
wWinMainCRTStartup. It also explains how to handle strings in Unicode, and type-safe functions like
_tcslen instead of
Also, read the article Unicode, MBCS and Generic text mappings.
I wrote a program that searches through source code for non-Unicode safe function calls and writes the info to a log file. It also replaces all string literals such as
TRACE("Hello there"); with Unicode-safe string literals such as
TRACE(_T("Hello there"));. It is not clean enough to post here, but I hope to someday. If you have a lot of code to convert to Unicode, developing such a utility is worth your effort.
Why to Build a Unicode App
Unicode applications will not run on Windows 95 or 98. They only run on Windows NT and 2000 (actually, an app can use Unicode internally on Win9x, but cannot pass Unicode strings to Windows API calls). The most recent book from Microsoft on internationalization (see the Books section below) recommends that all applications written for Windows 2000 be written in Unicode, no matter what the language.
There is a good reason to make a Unicode version of your English applications available, even if you are not going to localize to Chinese. Suppose you make a graphics application. With Unicode, on Japanese Windows 2000, the user can make the graph titles and text appear in Japanese, even if the application is in English.
Where do I get that Czech version of Windows 98?
You really do need to see and test your final application on Windows of the same language as your translated application. But you can't just go to Wal-Mart to buy Japanese Windows 98, or even order it from MicroWarehouse. It does not seem to be well publicized, but Microsoft offers an MSDN subscription with foreign language versions of all of its operating systems.
You can buy the MSDN subscription through a vendor such as Programmer's Paradise. You basically receive an empty box from them which tells you to call Microsoft to place your order. There are three international packs in MSDN. Last I knew, if you buy Pack 3, you get Packs 1 and 2, so Pack 3 is the one you want. MSDN online has a listing of all the CDs included in each of the international packs. Make sure that you are getting the languages you want. You are well advised to call Microsoft a couple of times to make sure you get the same answer from several different representatives. Even so, they never sent me Russian Windows. After a couple months, I called to find out why. They said I had to call them to request it. It was not sent automatically. But after I called, they sent it for no additional cost. The bottom line, is: make sure you receive all the disks you expect, and call MS if you don't!
Go to MSDN for a list of CDs in the subscription.
How about localized hardware?
MSDN is the source for your Japanese Windows 2000, but what about that Japanese keyboard? This is one of those questions to which I have no good answer. I was able to locate Russian and French keyboards here. Also, last time I talked to Micron, they said they were coming out with computers made for Arabic. As more and more companies are going "e", it should be easier to locate foreign language hardware on the Internet. You are just going to have to look for it.
But does the hardware matter? Testing on localized hardware is probably the least important aspect of the localized testing cycle. However, I understand that Japanese computers have some hardware nuances that may cause problems. Don't take my word for it, do your own research and evaluate the importance of localized hardware testing for your own applications.
What about right-to-left reading languages?
Arabic and Hebrew read right to left, adding another layer of complication for your translated applications. I have never worked with these languages, so I have no further information. I know that there are companies that specialize in Arabic software translation. Search the Internet for more information (coming here was a good start)!
Grammatical Issues with Non-English Languages
The technical issues are not the end of your translation adventure. Most other languages have grammatical complexities that must be considered in translation.
Almost every application has a "New" command on the File menu. New what? Well, new file, of course. In English, it doesn't matter what the new item is; New is always New. But in Spanish, is it Nuevo or Nueva? Here it depends on what the "what" is. If the "what" is masculine, we want Nuevo. If the "what" is feminine, we want Nueva. In many of my applications, there are "New" buttons on many different dialogs. It is important that the translator understand what the new item is so that the correct gender is used.
Of course, you could just decide to use masculine by default. This is a design decision that you will have to make.
Is it a command or a description?
In English, suppose you have a phrase such as "Open this Document". This could be a command, telling the user what to do. It could also be a description of a menu task, such as that which appears in the status bar when you highlight a recently opened file in the menu. In English, the phrase is the same for both the command and the description. But this may not be true for other languages.
I was recently working with a French translator on translating string resources for a product. My string resources included descriptions of menu items that appear in the status bar when you highlight a menu item. String resources also include captions for my Open and Save common dialog boxes, such as "Select A File to Import". The former is a description of a command, the latter is a command. In English, the same text could be a command or a description of a command, but in French, the text would be different depending on the context. It is important that your translator understands the context of each term.
Just recently, we began implementing a possible solution to this problem. Instead of prefixing all string identifiers with
IDS_, we are implementing command identifiers with
IDSC_ and description identifiers with
IDSD_ as shown below:
#define IDSC_OPEN_TEXT_FILE "Open Text File // this is a command
#define IDSD_OPEN_TEXT_FILE "Open Text File // this is a description
With this naming scheme, the translator can look at the identifier for the string to determine the context in which the string is used.
Is that a noun or a verb?
In English, many nouns and verbs are the exact same word. For example, the verb "to call" will appear as "Call" on a menu item or button. In this case, it is used as a verb. But "Call" is also a noun. In English, they are the same. But in almost any other romance language, the noun and verb are different. Your translator will always have to know the context in which each term is used.
A potential solution to this problem is to use different string identifier prefixes for nouns or verbs as described above for commands and descriptions.
There are only a few books available on software translations. I have three, and I got them all from amazon.com. Fortunately, they are all quite different and there is very little overlap in subject matter. They are:
- International Programming for Microsoft Windows by David A. Schmitt, published by Microsoft, 2000.
- This one just came out at the time of writing, so I haven't read it yet, but it looks pretty good. It spends a huge amount of time on locales. It also covers Unicode well, and the new localization issues with Windows 2000.
- A Practical Guide to Software Localization by Bert Esselink, published by John Benjamins Publishing Company, 1998
- This book spends quite a bit of time covering translation tools, translating on-line help and translating documentation. It also covers Macintosh translation, project management and Visual Basic.
- Internationalization; Developing Software for Global Markets by Tuoc V. Luong et. al., published by Wiley, 1995
- The authors came out of Borland when the company was in its prime (OWL used to be preferred over MFC!). This book spends a lot of time discussing how to create your own locale, which most people don't need to do, but which is very useful if you need to do it. It also discusses European and Asian localization specifically, keyboard configurations and Unicode. Don't let the publishing date fool you, the information is still useful.
Finding a Translator
Finding someone to do the translation may be the most difficult aspect of the translation. Not only do you need someone fluent in both languages, but they need to be comfortable with computers as well. If your application is targeted to a specific market, such as the medical industry, you will need a translator who is bilingual in medical terminology as well.
Translation Services: One method is to contract to a translation service. The advantage is that you get experienced and professional translators. The disadvantage is the high cost. There are many companies around the world that offer translation services. Most have web sites that can be located from your favorite search engine.
Independent Translators: Possibly less expensive than a translation company is an independent professional translator. Many can be located on the Internet from web sites that specialize in translation resources, and in translation newsgroups.
The Local University: For low budget translations, you can advertise at a local university. The advantage is that most large universities have many bilingual students and student translators may work cheap. The disadvantage is that the students may be unreliable and will probably give their schoolwork higher priority than your translation.
Keep in mind that many translators may want to charge by the number of words. This may be to your disadvantage because of the high levels of repetition of words and phrases typical of software. Also, use of software translation tools can automate much of the translation of the repetitive words and phrases. This also helps to maintain consistency within the translated application.
Glossaries -- What is Microsoft's French term for "status bar" or "print preview"?
Naturally, you want your French application to have a similar look and feel to Microsoft's French applications (OK, maybe you don't). So you will want to know what Microsoft calls the status bar in French, and the French menu item for Print Preview.
Microsoft is nice enough to provide glossaries of all the words and phrases used in their applications, along with the translated term in many different languages. The files are available for free from Microsoft's Web site (but they are quite large), and they are available on MSDN (international pack at least). What you get are comma-separated value (CSV) files for each application (Word, Excel, Outlook, Visual C++, Windows 98, Windows 2K, IE, etc.) in many different languages. You can easily search these files (using VC++ "Find in Files" search, works great) to see, for example, Microsoft's French term for "Disk I/O Error".
The better translation tools will be able to import these glossaries and use them to assist in translation of terms in your application. Be careful about licensing though. The glossaries are the intellectual property of Microsoft, and you cannot just replace all your text with all of theirs. Be sure to read the licensing agreement.
Just because you can write a Thai version of your slick new calendar program doesn't mean you should! Before you jump and start translating all your applications to other languages, ask yourself how you are going to market them. It may not be as easy as you think.
Before spending a lot of money on translation, be sure to investigate the market for your product in that language, and how you will reach that market.
Software localization is a hugely complex issue. There are only a few books available on the topic, they have very little overlap of information, and they still don't answer all of the questions you may have.
This article is intended to provide answers to only the most basic questions, to help you decide if you want to localize your product, and to tell you just what is involved. I don't even discuss such monumental topics as locales and code pages. Even so, this is information that I wish was easily available to me when I began my first localization projects.
I hope you find it useful, and I hope to post more detailed information on this subject in the future.