Click here to Skip to main content
13,294,516 members (60,259 online)
Click here to Skip to main content
Add your own
alternative version


17 bookmarked
Posted 10 Aug 2010

Develop Your Own Language Translation System

, 10 Aug 2010
Rate this:
Please Sign up or sign in to vote.
Understanding of Example Based Machine Translation (EBMT) system and how to create your own using exisiting tools


This article describes the development of Example Based Machine Translation (EBMT) system using Java on Linux platform for translation from one language to another. In this particular case, I will be translating English sentences to Hindi. The principle of translating in EBMT is simple: a system decides an appropriate translation of an input sentence by analyzing the pre-translated sentences in the database. Therefore, the larger the database of pre-translated sentences, greater will be the accuracy of the EBMT system.

This article is greatly inspired by the works of Ralf Brown and Balakrishnan who have done extensive research in this field.

Introduction and Background

Example based translation is essentially translation by analogy. This means that if an EBMT system is given a set of sentences in the source language (from which one is translating) and their corresponding translations in the target language, the system can use these examples to translate other such similar source language sentences into target language sentences. The basic premise is that, if a previously translated sentence occurs again, the same translation is likely to be correct again.

Software Used

Developing your own machine translation is a difficult task. However, there are some tools that can help accelerate the process. I used the following tools in my EBMT system:

  1. Moses Decoder
  2. Giza++
  3. IRST-LM

Block Diagram



I divided the entire EBMT system into four modules.

1. Module I: Exact Match Algorithm

In this module, the input English sentence is first checked with every sentence in the available bilingual corpora for an exact match. If found, the corresponding Hindi sentence is retrieved and displayed as output.

In the case when the input is a paragraph, then the input is first broken down into sentences, and each sentence is taken one by one and translated.

2. Module II: Sentence Rule Based Translation

Every language has some grammar that describes how the words in the sentences should be organized. For instance, consider English vs. Hindi. English follows Subject-Verb-Object (SVO) linguistic topology while Hindi follows Subject-Object-Verb (SOV) topology. To illustrate this example, compare the following two sentences:

English: Anshul plays football

Hindi: Anshul football khelta hai

This module converts the input language into tokenized format. For example, the above English sentence is converted to

<Subject> plays <Object>

This helps in generalizing the translation process.

Besides this, there are many other linguistic rules that must be taken into consideration while translating sentences.

3. Module III: Phrase Decoder

When the first modules fail to translate, we divide the sentences into phrases against which we run algorithms based on statistical machine translation to find the most probable translated output of the input sentence.

Mathematically, we try to find out:

H*= arg max<sub>H</sub>P(H/E)            -(1)

I know this sounds complicated, so let me explain how we came to this equation.

According to the famous Bayes Law (Probability),

P(A/B) = P(B/A) * P(A)/P(B) 

In this case, we need to find that translated sentence A which has max probability of being the correct translation for a given input sentence B. Since we are looking for the most likely outcome A* for an event, given a fixed event B, P(B) is constant and doesn't play a role.

Thus, we want:

=> A* = arg max<sub>A</sub> P(A/B)

=> A*=arg max<sub>A</sub> P(B/A)*P(A)/P(B)

=> A*= arg max<sub>A</sub> P(B/A)*P(A)         -same as (1)   

This module tries to find the most probable Hindi translation of an English sentence by trying to find phrase H that would maximize P(E/H)*P(H). Phrases like these are clubbed together to complete the sentence.


  • P(H)=[Language model probability]:

    I used IRST-Language Model that measures fluency and probability of Hindi sentence and provide a set of fluent sentences to test for potential translation.

  • P(E/H)=[translation model probability H->E]:

    I used Giza++ that measures faithfulness, Probability of an (English, Hindi) pair given a Hindi sentence and test if a given fluent sentence is a translation.

  • arg maxH

    I used Moses Decoder that uses heuristic search to effectively and efficiently find H*.

4. Module IV: Word Decoder

This is the last attempt by EBMT to translate the input sentence. When Module III also fails to translate, EBMT breaks the sentence into words. For every word, it tries to seek the dictionary translation and simply stitches the outputs into a translated sentence.

Setup of EBMT

Basic preparation of an EBMT system requires you to do the following:

  1. Develop a bilingual corpora having pretranslated sentences from source language to destination language.
  2. Once you have a decent size corpora, then you need to install Giza++, Moses and IRST on your system.
  3. IRST requires monolingual file as well. This can easily be created by separating the bilingual corpora.
  4. Finally, you need to train your corpora with giza++. At the backhand, shell scripts and Perl scripts are run that compute probabilities and generate various files such as alignment file, translation table, fertility file, distoration table, etc.


Training with Giza++ took 1.5 days. After which my EBMT system was ready!


Future Work

Machine translation is a research field with a lot of work already done and a lot more yet to be done. I merely demonstrated how you can use existing tools to create your own machine translation system. This is my first step towards innovation and I have a long way to go...


  • 11th August, 2010: Initial post


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

No Biography provided

You may also be interested in...


Comments and Discussions

QuestionSource code Pin
Member 1323648726-Oct-17 7:53
memberMember 1323648726-Oct-17 7:53 
QuestionHelp: Source Code Pin
Member 1336617717-Aug-17 12:29
memberMember 1336617717-Aug-17 12:29 
QuestionSource Code Pin
Member 1328754130-Jun-17 17:48
memberMember 1328754130-Jun-17 17:48 
Questionsource code Pin
xorthomson17-Jun-17 16:04
memberxorthomson17-Jun-17 16:04 
Questionsource code Pin
Member 1308028123-Mar-17 9:49
memberMember 1308028123-Mar-17 9:49 
Questionsource code Pin
Member 1307988123-Mar-17 6:19
memberMember 1307988123-Mar-17 6:19 
QuestionNeed Help Pin
Member 129324643-Jan-17 2:44
memberMember 129324643-Jan-17 2:44 
QuestionSource Code Pin
Member 1287578428-Nov-16 18:31
memberMember 1287578428-Nov-16 18:31 
QuestionMore details about concept Pin
Member 1257704827-Sep-16 6:57
memberMember 1257704827-Sep-16 6:57 
QuestionRequest for help Pin
Member 123048603-Feb-16 1:14
memberMember 123048603-Feb-16 1:14 
QuestionRequest for source code Pin
Hammadh Abdul Rahman25-Nov-15 5:15
memberHammadh Abdul Rahman25-Nov-15 5:15 
Questionawesome work Pin
Subramanyam D8-Sep-15 3:46
memberSubramanyam D8-Sep-15 3:46 
Questionsource code Pin
hossei.gholami7-Apr-15 5:20
memberhossei.gholami7-Apr-15 5:20 
QuestionHow To Develop a Bilingual Corpora Pin
Samuel Kaiser24-Sep-14 7:23
memberSamuel Kaiser24-Sep-14 7:23 
QuestionPlease Help. Pin
Navnath Kagde19-Aug-14 1:30
memberNavnath Kagde19-Aug-14 1:30 
Questionhelp me Pin
limh_dan28-Jun-14 23:14
memberlimh_dan28-Jun-14 23:14 
QuestionDeveloping MT System Pin
Ahmed Salah Eldein15-Mar-14 5:36
memberAhmed Salah Eldein15-Mar-14 5:36 
GeneralSource Code Pin
Member 1033714615-Oct-13 1:53
memberMember 1033714615-Oct-13 1:53 
GeneralSource Code Pin
Kenneth Sim19-Sep-13 14:45
memberKenneth Sim19-Sep-13 14:45 
QuestionDear anshulskywalker! Pin
Endrias Haile2-Aug-13 21:47
memberEndrias Haile2-Aug-13 21:47 
QuestionSource Code Pin
fahmiomar1-Aug-13 5:41
memberfahmiomar1-Aug-13 5:41 
GeneralVery useful application Pin
Roopali 230-Jul-13 23:48
memberRoopali 230-Jul-13 23:48 
QuestionAwesome project Pin
Jandiv27-Jul-13 22:58
memberJandiv27-Jul-13 22:58 
QuestionGreeting all Pin
Member 101409074-Jul-13 14:38
memberMember 101409074-Jul-13 14:38 
QuestionThis is very nice and life saving article. Pin
jagdish240611-Apr-13 23:10
memberjagdish240611-Apr-13 23:10 
AnswerRe: This is very nice and life saving article. Pin
jagdish240617-Apr-13 21:59
memberjagdish240617-Apr-13 21:59 
GeneralRe: This is very nice and life saving article. Pin
Member 101409074-Jul-13 14:39
memberMember 101409074-Jul-13 14:39 
Questionsource code Pin
gbigotes28-Feb-13 6:54
membergbigotes28-Feb-13 6:54 
Questionrequest Pin
meys_online13-Oct-12 22:46
membermeys_online13-Oct-12 22:46 
QuestionSource Code Pin
akatsa13-Oct-12 7:10
memberakatsa13-Oct-12 7:10 
AnswerRe: Source Code Pin
Er.Maninderjit21-Nov-12 22:11
memberEr.Maninderjit21-Nov-12 22:11 
QuestionEnglish Pin
silncs21-Jun-12 1:56
membersilncs21-Jun-12 1:56 
Generalown language translation Pin
anuradhapriyankara8-Apr-12 22:31
memberanuradhapriyankara8-Apr-12 22:31 
QuestionDifficulties with my translation model Pin
sarahaf7-Apr-12 5:20
membersarahaf7-Apr-12 5:20 
AnswerRe: Difficulties with my translation model Pin
mululer16-Aug-12 0:17
membermululer16-Aug-12 0:17 
Questionsource code Pin
paras.desai25-Aug-11 9:07
memberparas.desai25-Aug-11 9:07 
Generalneed more light on topic Pin
awotipe o o7-Dec-10 12:44
memberawotipe o o7-Dec-10 12:44 
GeneralNot sure what this is Pin
Trollslayer8-Aug-10 2:32
mentorTrollslayer8-Aug-10 2:32 
GeneralRe: Not sure what this is Pin
anshulskywalker8-Aug-10 2:50
memberanshulskywalker8-Aug-10 2:50 
GeneralRe: Not sure what this is Pin
mululer14-Aug-12 0:05
membermululer14-Aug-12 0:05 
GeneralRe: Not sure what this is Pin
Thai_hacker15-Aug-13 0:25
memberThai_hacker15-Aug-13 0:25 
GeneralRe: Not sure what this is Pin
Richard Benjamin10-Sep-16 16:46
memberRichard Benjamin10-Sep-16 16:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.171207.1 | Last Updated 11 Aug 2010
Article Copyright 2010 by anshulskywalker
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid