Click here to Skip to main content
Click here to Skip to main content

Tagged as

Plural Forms

, 15 Apr 2008 LGPL3
Rate this:
Please Sign up or sign in to vote.
Spelling messages like "5 file(s) found" correctly in any language

Introduction

Messages like "%d file(s) found" are notoriously hard to localize. In English language, there are only 2 forms: 1 file (singular) and 2 or more files (plural), but other languages use up to 4 plural forms. For example, there are 3 forms in Polish:

    0 plików
    1 plik
  2-4 pliki
 5-21 plików
22-24 pliki
25-31 plików
      etc.

Other languages (French, Russian, Czech, etc.) also use rules different from English and from each other.

The gettext library extracts a rule for plural form selection from the localization file. The rule is a C language expression, which is evaluated for each message. It's a universal solution, but IMHO, an expression evaluator is overkill for this task.

My Solution

I developed a simpler solution, which works for all languages mentioned on gettext page. It is based on these observations:

  • All additional plural forms are used for some range of numbers, e.g., from 2 to 4 in Slovak and Czech.
  • The pattern is often repeated for each 10 or 100 items. In Russian, it sounds like "twenty-one file", not "twenty-one files", because the noun agrees with the last figure, "one". The same pattern repeats for 30, 40, etc.
  • The numbers from 10 to 19 (I call them "teens" for short) are often an exception to the rules. Just like 16 is spelled differently from 26, 36, 46, etc. in English: "sixteen" vs. "twenty-six", "thirty-six", and "forty-six".
  • Zero is treated differently in some languages, e.g. Romanian.

So, the rule for each plural form will consist of these components:

range_start  range_end  modulo_for_repetition  skip_teens_flag

Here are some examples:

English
singular - range_start = 1, range_end = 1
plural   - all other numbers

Polish
singular - range_start = 1, range_end = 1
plural1  - range_start = 2, range_end = 4, modulo = 10, skip_teens = true
plural2  - all other numbers

Irish
singular - range_start = 1, range_end = 1
plural1  - range_start = 2, range_end = 2
plural2  - all other numbers

Lithuanian
singular - range_start = 1, range_end = 1, modulo = 10, skip_teens = true
plural1  - range_start = 2, range_end = 9, modulo = 10, skip_teens = true
plural2  - all other numbers ("teens")

The rules for each language could be written to a short string, which is stored in the language file (e.g., for Lithuanian, the string is "1 1 10 t; 2 9 10 t").

Using the Code

To use my solution, include plurals.h and plurals.c in your project. The interface consists of two functions. First, you call PluralsReadCfg to read rules from the string. Next, you pass a number to PluralsGetForm. It returns the index of the correct plural form for this number, which you use to read the string from your language file:

PLURAL_INFO plurals;
PluralsReadCfg(&plurals, ReadFromLngFile("PluralRules"));

char lng_str_name[16], message[128];
sprintf(lng_str_name, "FilesFound%d", PluralsGetForm(&plurals, number));
sprintf(message, ReadFromLngFile(lng_str_name), number);

In the language file, you have strings for each plural form:

PluralRules = "1"
FilesFound0 = "%d file found"
FilesFound1 = "%d files found"

ReadFromLngFile is your own function. You could wrap two sprintfs in a higher-level function (and, of course, use a secure function instead of sprintf to protect your program from buffer overflow).

Conclusion

Two functions, PluralsReadCfg and PluralsGetForm, take 500 bytes in your executable file when compiled with MSVC++. This is a small price to pay for spelling your messages correctly in any language.

History

  • 15th April, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)

Share

About the Author

Peter Kankowski
Software Developer
Russian Federation Russian Federation
Peter lives in Siberia, the land of sleeping sun, beautiful mountains, and infinitely deep snow. He recently started a wiki about algorithms and code optimization, where people could share their ideas, learn, and teach others.

Comments and Discussions

 
GeneralSimilar Article - "CustomFormat" - has conditional formatting that might improve localization PinmemberScott Rippey22-Oct-09 8:47 
GeneralRe: Similar Article - "CustomFormat" - has conditional formatting that might improve localization PinmemberPeter Kankowski22-Oct-09 16:05 
QuestionWhere is the source code? Pinmemberch_cu22-May-08 4:28 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.1411023.1 | Last Updated 15 Apr 2008
Article Copyright 2008 by Peter Kankowski
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid