Click here to Skip to main content
6,630,901 members and growing! (21,181 online)
Email Password   helpLost your password?
Languages » C / C++ Language » General     Intermediate

Translation Web Service in C#

By Matthew Brealey

C# Web Service to translate text using Babelfish.
C#.NET 1.0, .NET 1.1, Win2K, WinXP, Win2003, ASP.NET, VS.NET2003, Dev
Posted:4 Jan 2004
Updated:18 Apr 2004
Views:182,481
Bookmarked:58 times
Announcements
Loading...
 
Search    
Advanced Search
Add to IE Search
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
26 votes for this article.
Popularity: 5.66 Rating: 4.00 out of 5
2 votes, 7.7%
1
1 vote, 3.8%
2

3
6 votes, 23.1%
4
17 votes, 65.4%
5

Introduction

Most people are now aware that most of the world is not English, even though it's very easy to miss this fact when surfing the web, simply because Google gives English results to people searching English, and so you conveniently miss all the pages in German/French/Italian etc.

The popularity of Altavista's famous Babelfish service is therefore hardly surprising - converting text or web pages into other languages is a useful thing to do.

For a while, anyone looking to integrate translation into their app would simply have had to plug in the Babelfish WSDL. Posters to newsgroups were directed to the free service from xmethods, a good source for a variety of web services (SMS, etc.). In fact, the Babelfish WSDL is the 9th hit on Google for WSDL.

So I plugged it into my apps, intranet, extranet and anything else that vaguely looked like it would benefit from a translation service. And life was good.

But one day the service stopped working, apparently for good. So I had to write a replacement. And here it is.

Code

This is a pretty simple job, and can be broken down into the following subtasks:

  1. Get text for translation and encode it into a HTTP POST request
  2. Send the data to the web server, acting in effect as a .NET web browser
  3. Read the response back into a big string
  4. Remove all the HTML and formatting and send the raw translated string back to the client.

So fire up Visual Studio .NET, and create an ASP.NET Web Service, and name it Translation, and add a Translate.asmx file. There are two inputs: the translation mode (e.g., French to English), and the data to be translated (e.g., 'the quick brown fox jumps over the lazy dog'). To make it a plug-in replacement for the old service, I gave my method the same name and parameters as the old one:

[WebMethod]
public string BabelFish(string translationmode, string sourcedata) 
{
}

The translation modes can be found in the source of the page at Babelfish:

readonly string[] VALIDTRANSLATIONMODES = new string[] 
 {"en_zh", "en_fr", "en_de", "en_it", "en_ja", "en_ko", "en_pt", "en_es", 
 "zh_en", "fr_en", "fr_de", "de_en", "de_fr", "it_en", "ja_en", "ko_en", 
 "pt_en", "ru_en", "es_en"};

The code performs validation to check for a valid mode before passing it on to Babelfish. After that, we create a POST request. The syntax for a HTTP POST request looks something like this:

POST /babelfish/tr/ HTTP/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 51

lp=en_fr&tt=urltext&intl=1&doit=done&urltext=cheese

It's pretty simple, and if you want, you could use low-level sockets to write the data to the server. Microsoft provides some better ways to do this however, and so we use the HttpWebRequest class, which has lots of built-in features to make it easy to work with HTTP connections.

Uri uri = new Uri(BABELFISHURL);
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(uri);
request.Referer = BABELFISHREFERER;
// Encode all the sourcedata 

string postsourcedata;
postsourcedata = "lp=" + translationmode + 
    "&tt=urltext&intl=1&doit=done&urltext=" + 
HttpUtility.UrlEncode(sourcedata);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postsourcedata.Length;
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";
Stream writeStream = request.GetRequestStream();
UTF8Encoding encoding = new UTF8Encoding();
byte[] bytes = encoding.GetBytes(postsourcedata);
writeStream.Write(bytes, 0, bytes.Length);
writeStream.Close();
HttpWebResponse response = (HttpWebResponse) request.GetResponse();
Stream responseStream = response.GetResponseStream();
StreamReader readStream = new StreamReader (responseStream, Encoding.UTF8);
string page = readStream.ReadToEnd();

We end up with a string containing the entire Babelfish page. As it stands, this is about 99% noise (HTML tags, Altavista information, etc.), and 1% the translation we were looking for. So we need a regular expression to find the translated text. By looking at the HTML page, you will find the translation is contained between:

<Div style=padding:10px; lang=fr>translation here</div>

So the required regular expression looks like this (note: while testing my regular expressions, I got lots of help from Regulator):

<Div style=padding:10px; lang=..>((?:.|\n)*?)</div>

This will match the whole <div>...</div> string. This is a fairly complex regular expression, but basically, the . character matches everything, except for newlines, hence the (.|\n) pattern, which means any character (except newlines) or new lines.

The brackets create a matching group, meaning that the text within the brackets (namely the translation) will be put in its own group at index 1 (index 0 contains the whole match).

The ?: pattern suppresses grouping: () normally creates a matching group: in this case, we are only using the pattern to allow for line breaks in long translations.

Finally *? is a lazy regular expression, matching every character up to the first instance of <div>. (If I had used plain *, the expression would be greedy, and would chomp right up to the LAST </div>.)

Here's the code:

Regex reg = new Regex(@"<Div style=padding:10px; lang=..>(.*?)</div>");
MatchCollection matches = reg.Matches(page);
if (matches.Count != 1 || matches[0].Groups.Count != 2) 
{
    return ERRORSTRINGSTART + "The HTML returned from Babelfish " + 
        "appears to have changed. Please check for" + 
        " an updated regular expression" + 
        ERRORSTRINGEND;
}
return matches[0].Groups[1].Value;

And subject to error checking, that's it!

Using it

Download the code, and unzip it somewhere. Add a virtual directory called Translation in IIS. Go to /translate.asmx and click Test, and enter some test data (say 'en_fr', and 'cheese'). If it works, you are ready to use it in your web and Windows Forms applications.

To use it in your app, add a Web Reference to the asmx, to the program you want to use it in; Visual Studio will create a proxy reference for you, which you can then use to perform translation.

Here's some sample code-behind:

namespace test
{
    using System;
    using System.Data;
    using System.Drawing;
    using System.Web;
    using System.Web.UI.WebControls;
    using localhost1; // assuming that's the reference generated

    using System.Web.UI.HtmlControls;

    /// <summary>

    ///     Summary description for WebUserControl1.

    /// </summary>

    public class WebUserControl1 : System.Web.UI.UserControl
    {
        protected System.Web.UI.WebControls.DropDownList ddTranslationMode;
        protected System.Web.UI.WebControls.TextBox txtText;
        protected System.Web.UI.WebControls.Label lblTranslation;
        protected System.Web.UI.WebControls.Button submitButton;

        private void Page_Load(object sender, System.EventArgs e)
        {
            // Put user code to initialize the page here

        }

        protected void submitButton_Click(object sender, System.EventArgs e) 
        {
            string translationMode = 
                this.ddTranslationMode.SelectedItem.Value;
            string translationText = this.txtText.Text.Trim();
            string translation = "";
            try 
            {
                Translate tr = new Translate();
                translation = tr.BabelFish(translationMode,translationText);
            }
            catch (Exception exp) 
            {
                translation = "There was an error accessing the server: " 
                                                             + exp.Message;
            }
            this.lblTranslation.Text = translation;
        }
    }
}

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Matthew Brealey


Member

Location: United Kingdom United Kingdom

Other popular C / C++ Language articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 25 of 68 (Total in Forum: 68) (Refresh)FirstPrevNext
GeneralThe HTML returned from Babelfish appears to have changed. Please check for an updated regular expression Pinmembermario emilio zab3:16 14 Jan '09  
GeneralRe: The HTML returned from Babelfish appears to have changed. Please check for an updated regular expression Pinmemberdinahafez3:30 22 Mar '09  
GeneralRe: The HTML returned from Babelfish appears to have changed. Please check for an updated regular expression Pinmembershilesh21:25 29 Jun '09  
GeneralRe: The HTML returned from Babelfish appears to have changed. Please check for an updated regular expression Pinmembershilesh2:41 30 Jun '09  
GeneralRe: The HTML returned from Babelfish appears to have changed. Please check for an updated regular expression Pinmemberkavitharani20:59 23 Sep '09  
GeneralWOWOWOW Pinmembergreenknt0:31 23 Nov '08  
GeneralC# version for BabelFish & Google PinmemberZachary Yates9:46 23 Jul '08  
GeneralRe: C# version for BabelFish & Google PinmemberChe Mass1:16 28 Aug '09  
Generalwebservice for Translating Pinmemberangelsherin3:24 29 Jun '08  
GeneralTranslating webservice Pinmemberangelsherin3:24 29 Jun '08  
GeneralNot working PinmemberEugen Wiebe1:03 4 Jun '08  
GeneralNo Babelfish web service PinmemberSandeepan2:12 17 Apr '08  
GeneralRe: No Babelfish web service Pinmemberjose Omar8:08 22 May '08  
GeneralRe: No Babelfish web service Pinmemberwindrago13:39 5 Jun '08  
Questionno "test" button appear PinmemberGabriyel18:58 11 Jun '07  
GeneralBig Chunk of Content with HTML Tags Pinmemberpbansal2:45 12 Mar '07  
GeneralUnable to connect to the remote server Pinmemberpbansal1:46 12 Mar '07  
GeneralURGENT HELP REQUIRED PinmemberHARISHRAM4:39 13 Dec '06  
GeneralThere was a problem connecting to the Babelfish server Pinmemberbijulsoni3:09 14 Nov '06  
GeneralRe: There was a problem connecting to the Babelfish server Pinmemberdimitar200410:59 3 Dec '06  
GeneralRe: There was a problem connecting to the Babelfish server Pinmemberbijulsoni20:25 3 Dec '06  
GeneralRe: There was a problem connecting to the Babelfish server PinmemberBrenda Lowe14:03 3 Apr '08  
GeneralIncorrect content-length header PinmemberBret Mulvey6:42 20 Oct '06  
GeneralUsing VC++ ??? PinmemberDingos Pingo19:04 10 Aug '06  
GeneralRe: Using VC++ ??? PinmemberRavi Bhavnani5:16 13 Dec '06  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 18 Apr 2004
Editor: Nishant Sivakumar
Copyright 2004 by Matthew Brealey
Everything else Copyright © CodeProject, 1999-2009
Web17 | Advertise on the Code Project