Click here to Skip to main content
15,914,413 members
Please Sign up or sign in to vote.
4.00/5 (1 vote)
See more:
So I assume it does it like that:

1. Constantly spell check input till it finds that most of it doesn't make sense.
2. At this point the input is somehow "transkeyed" into every known language.
(Or simply user installed languages).
3. Those entries are spell checked till an entry in a specific language makes sense and then this phrase is searched.

http://imageshack.us/photo/my-images/89/capturece.jpg/[^]
In this image I'm writing "google is able to do it easily" with the input language set to Arabic(101) and Google detects that what I wrote was in English. It works with combinations as well not only single keys.

(Because an ordinary US keyboard uses 101 keys, I don't want the problem to get more complicated so we could exclude languages those use more keys when we think a method if it shortened the way, then later worry about that.)
Posted

They are not going to let us know how they did it - if they did that, it would not be unique any more.
:)

This[^] would be a simple approach to start language detection, but really, there is tons more to it.
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 18-Jul-11 2:10am    
Agree, but the very first step should be pretty simple -- please see my answer. :-)
--SA
Hesham_h4 18-Jul-11 12:34pm    
I think this method is close to the one used in Google translate (Not accurate at all though), it detects the language from a text written in it, not a text written in a different language which is our topic here.
Ask Google, but don't expect the answer :-).

Now, the start of it would be much simpler than you suggested. You can get dominant sub-set of Unicode code points used. Most often it is characterized by two high-order bytes of the integer value of code point. For example, Cyrillic will cover several Slavic languages and some Asian languages, Perso-Arabic Script will cover Persian, Arabic and few Indian languages, and so on. This along with greatly narrow down the search. After that referring to dictionaries should come, with all the complex techniques.

SA
 
Share this answer
 
Comments
Abhinav S 18-Jul-11 2:10am    
Yes an interesting approach. My 5.
Sergey Alexandrovich Kryukov 18-Jul-11 2:13am    
Thank you, Abhinav. Just a first simple step. Everything else is way more complex. OP gave one idea about spell-checking.
--SA
Hesham_h4 18-Jul-11 12:52pm    
I think I'm asking them, and if they answer me (5% probability) I'll mark this as the solution :).
Funny fact is that I asked that question at Microsoft forums, forgot that Bing would be using it if they knew :)
About your approach, I think that detecting the input language isn't a big deal if we talk about a .NET application and so the spell check would be just appropriate, "transkeying" as I called it is still the big problem here. We need to know what keys were pressed, let's assume we know because we are working upon user input not on a copy-past text, then we need to simulate each of these scripts to get a probable text (Remember that some characters can be entered in multiple ways) and then comes your method in place to detect which one is used and so narrow the language probabilities.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900