Click here to Skip to main content
15,861,125 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
I need to find the languages of an string in my stand alone program
i don't want use bing translate or google translate .
thanks.

in fact i tried this but i cannot do it for all languages, can i?

C#
  public string FindLang(string text)
        {
            string result = "";
            if (text.Any(c => c >= 0xFB50 && c <= 0xFEFC))
            {
                result += "Arabic";
            }
            if (text.Any(c => c >= 0x0600 && c <= 0x06FF))
            {
                result += "Persian";
            }
            if (text.Any(c => c >= 0x20 && c <= 0x7E))
            {
                result += "English";
            }
            if (text.Any(c => c >= 0x0530 && c <= 0x058F))
            {
                result += "Armenian";
             }
            if (text.Any(c => c >= 0x2000 && c <= 0xFA2D))
            {
                result += "Chinese";
            }
return result;
Posted
Updated 23-Dec-12 14:43pm
v2
Comments
Zoltán Zörgő 23-Dec-12 16:36pm    
Fairly: forget it.
austinbox 23-Dec-12 17:18pm    
Try checking in the string for unique characters from each language, then go from there.
Christ_88 23-Dec-12 20:44pm    
the code which i added exactly do what you said, am i right ?

If your string is big enough you may attempt a heuristic approach: build a dictionary with most frequent terms for all the languages your application supports and then find 'the best match' on the give string.
 
Share this answer
 
First of all, just by having in the text some specific characters from unicode chart, does not mean, that we have identified the language. This might be an extra evaluation, but can not be the only one. And one thing is to estimate if the text is from a short list of languages (let's say, up to 10) - and an other thing is to tell from any string if it is from any language. As lower the word count and as wider the language spread is, the less accuracy you will have. And take into account that the languages are not a disjoint sets neither from word set, nor from character set point of view. Estimates vary between around 6,000 and 7,000 languages in number (Wikipedia[^]).

I understand, that you want a solution, but what you specified is a really hard signal processing task. If you read this article carefully (Detect a written text's language[^]), you will get a good starting point (you can even use it directly), but be you will also notify, that what you specified is most likely not possible - and I don't know what you exactly need, only what you wrote.

Good luck.
 
Share this answer
 
C#
var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
  if (!result.error) {
    var language = 'unknown';
    for (l in google.language.Languages) {
      if (google.language.Languages[l] == result.language) {
        language = l;
        break;
      }
    }
    var container = document.getElementById("detection");
    container.innerHTML = text + " is: " + language + "";
  }
});

Reference : stackoverflow.com[^]
 
Share this answer
 
Comments
Christ_88 23-Dec-12 23:41pm    
thank for the answer,
i already saw these code but my program is gonna be stand alone , and no connection with internet.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900