We have around 20 websites. Management is expanding overseas, and wants all the sites to be multi-lingual, yesterday, of course. And, the hard part is that some of the content is user driven. Faced with a future of endless tagging for resource files, I needed to come up with a better solution. Being a programmer, I wanted to create something that would make all my problems go away. I needed something I could use on any site, without messing much with the source code. It also needs to translate the pages itself, but needed to learn new words when users updated content.
Making it Universal
Like it Raw
Actually, getting the raw output turns out to be a bit of a trick. The only place you can get a hold of it is by creating a
Filter object and assigning it to the
Response object, as follows…
Dim f = _Context.Response.Filter
Dim sr As TranslateFilterToDom = New TranslateFilterToDom(f)
sr.Language = GetUserLanguage()
_Context.Response.Filter = sr
You create an object that inherits from
Stream. Most of it you want to leave alone, just fill in the footprint. What isn’t going to stay the same is the
Write method. This is a stream, so the HTML gets populated in chunks. What we are looking for is the end HTML tag...
Public Overrides Sub Write(ByVal Buffer() As Byte, _
ByVal offset As Integer, ByVal count As Integer)
Dim sBuffer As String = System.Text.UTF8Encoding.UTF8.GetString(Buffer, offset, count)
Dim rHTML As Regex = New Regex("</html>", RegexOptions.IgnoreCase)
If (Not rHTML.IsMatch(sBuffer)) Then
Dim finalHtml As String = responseHtml.ToString()
If Language <> "en" Then
xml = New XmlDocument
Dim xmldecl As XmlDeclaration
xmldecl = xml.CreateXmlDeclaration("1.0", "UTF-8", Nothing)
finalHtml = finalHtml.Substring(finalHtml.ToLower.IndexOf("<html"))
finalHtml = finalHtml.Replace(" ", " ")
finalHtml = finalHtml.Replace("<br>", "<br/>")
finalHtml = finalHtml.Replace("<hr>", "<hr/>")
finalHtml = finalHtml.Replace("&", "&")
finalHtml = finalHtml.Replace("<head>", "<head>" & META)
finalHtml = Translate(xml)
finalHtml = finalHtml.Replace("&", "&")
Dim data As Byte() = System.Text.UTF8Encoding.UTF8.GetBytes(finalHtml)
_sink.Write(data, 0, data.Length)
The idea here is to parse through HTML tags on the way to the client. This way, we don’t have to worry about server controls, and don’t have to change code in the application. The problem is how to go about parsing through it. If the HTML is XHTML compliant, we can just dump it into an
XMLDocument and parse that. Then, we pass it to the
Private Function Translate(ByVal poxml As XmlDocument) As String
Dim loNode As XmlNode = DirectCast(poxml.DocumentElement, XmlNode)
If _IsManaul Then
Dim xmlNode As XmlNode = poxml.GetElementsByTagName("head")(0)
Dim xmlScript As XmlElement = poxml.CreateElement("Script")
xmlScript.InnerXml() = GetManualScript()
So, we are just kicking off the recursion to parse the
XMLDocument here. In addition, we are going to handle the event that the MT program doesn’t have the ability to translate the user's selected language, which is what the
IsManual (I know it’s misspelled in the code) is all about.
So, here comes the surgical part. We need to get just the text that users can see. We care about text in the HTML between tags. We care about text in
value attributes. What we don’t want is textboxes or textareas since users will be writing their language. We don’t care about what’s in
Private Sub TranslateNode(ByVal poNode As XmlNode)
Dim lbTraverse As Boolean = True
Select Case poNode.Name.ToLower
poNode.Value = TranslateText(poNode.Value, Nothing)
Dim lsType As String
If poNode.Attributes.ItemOf("type") Is Nothing Then
lsType = "text"
lsType = poNode.Attributes.ItemOf("type").Value.ToLower
If Not poNode.Attributes.ItemOf("value") Is Nothing Then
Select Case lsType
Case "button", "submit"
poNode.Attributes.ItemOf("value").Value = _
Case "script", "style", "textarea" .
lbTraverse = False
If lbTraverse Then
For Each loNode As XmlNode In poNode.ChildNodes
So, now that we have the text we want to translate, how do we translate it? The downside of using MT is that it’s slow, at least slower than just looking the word up out of a database. What we do is once we translate a word or phrase, we save it off for use by our website or any website that uses this HTTP module.
Private Function TranslateText(ByVal psString As String, _
ByVal pnode As XmlNode) As String
Dim result As String = String.Empty
psString = psString.Replace(vbCr, "")
psString = psString.Replace(vbCrLf, "")
psString = psString.Replace(vbLf, "")
psString = psString.Replace(vbTab, "")
psString = psString.Trim
result = Lookup(psString)
If result.Trim = String.Empty Then
Dim loTranslator As ITranslator = New Systran
If Not loTranslator Is Nothing AndAlso _
Array.Exists(loTranslator.AvailableLanguages, AddressOf HasLanguage) Then
result = loTranslator.Translate(psString, "en", Language)
_IsManaul = True
result = psString
Here is where the magic happens. I set this up so you can add your own translator by just implementing
ITranslator. We are looking at using Systran, which has a Web Service API.
Public Function Translate(ByVal psString As String, _
ByVal psFromLanguage As String, ByVal psToLanguage As String) _
As String Implements ITranslator.Translate
Dim lsURL As String = My.Settings.TransaltionURL.Trim.Replace("@Language", psToLanguage)
Dim loWebclient As WebClient = New WebClient
loWebclient.Encoding = Encoding.UTF8
Dim result As String = loWebclient.UploadString(lsURL, psString).Trim
Return result.Replace("body=", "")
So What If…
This is the English version of the demo. No translation takes place.
This is translated to Japanese using Systran MT.
Pardon the spelling. Systran does not translate Vietnamese. So, the user will need to offer a translation. They just select the English text and a dialog pops up asking for the translation. This is then stored in the database for use by all websites.
This is the database table. Across a site or sites of any size, this will get pretty big, but it can be used by all sites. Once a site learns a word, all sites know it at that point, and don't need to go look it up.
Systran does not translate Vietnamese. So, the user will need to offer a translation. They just select the English text and a dialog pops up asking for the translation. This is then stored in the database for use by all websites. Of course, if the user misspells something as is demonstrated here, it will show up misspelled so adding a multi-lingual spell checker would be a nice touch.
Get rid of the RESX files and let your websites learn from each other using MT as a teacher. I hope this is an idea you can expand on. The code will probably need some massaging to get it baked into your site, but hopefully, it is an idea you can use.