Click here to Skip to main content
Sign Up to vote bad
good
See more: C#ASP.NET
I need to find the languages of an string in my stand alone program
i don't want use bing translate or google translate .
thanks.
 
in fact i tried this but i cannot do it for all languages, can i?
 
  public string FindLang(string text)
        {
            string result = "";
            if (text.Any(c => c >= 0xFB50 && c <= 0xFEFC))
            {
                result += "Arabic";
            }
            if (text.Any(c => c >= 0x0600 && c <= 0x06FF))
            {
                result += "Persian";
            }
            if (text.Any(c => c >= 0x20 && c <= 0x7E))
            {
                result += "English";
            }
            if (text.Any(c => c >= 0x0530 && c <= 0x058F))
            {
                result += "Armenian";
             }
            if (text.Any(c => c >= 0x2000 && c <= 0xFA2D))
            {
                result += "Chinese";
            }
return result;
Posted 23 Dec '12 - 9:57
Edited 23 Dec '12 - 14:43

Comments
Zoltán Zörgő - 23 Dec '12 - 16:36
Fairly: forget it.
austinbox - 23 Dec '12 - 17:18
Try checking in the string for unique characters from each language, then go from there.
Christ_88 - 23 Dec '12 - 20:44
the code which i added exactly do what you said, am i right ?

3 solutions

First of all, just by having in the text some specific characters from unicode chart, does not mean, that we have identified the language. This might be an extra evaluation, but can not be the only one. And one thing is to estimate if the text is from a short list of languages (let's say, up to 10) - and an other thing is to tell from any string if it is from any language. As lower the word count and as wider the language spread is, the less accuracy you will have. And take into account that the languages are not a disjoint sets neither from word set, nor from character set point of view. Estimates vary between around 6,000 and 7,000 languages in number (Wikipedia[^]).
 
I understand, that you want a solution, but what you specified is a really hard signal processing task. If you read this article carefully (Detect a written text's language[^]), you will get a good starting point (you can even use it directly), but be you will also notify, that what you specified is most likely not possible - and I don't know what you exactly need, only what you wrote.
 
Good luck.
  Permalink  
var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
  if (!result.error) {
    var language = 'unknown';
    for (l in google.language.Languages) {
      if (google.language.Languages[l] == result.language) {
        language = l;
        break;
      }
    }
    var container = document.getElementById("detection");
    container.innerHTML = text + " is: " + language + "";
  }
});
Reference : stackoverflow.com[^]
  Permalink  
Comments
Christ_88 - 23 Dec '12 - 23:41
thank for the answer, i already saw these code but my program is gonna be stand alone , and no connection with internet.
If your string is big enough you may attempt a heuristic approach: build a dictionary with most frequent terms for all the languages your application supports and then find 'the best match' on the give string.
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Your Filters
Interested
Ignored
     
0 Sergey Alexandrovich Kryukov 564
1 Maciej Los 255
2 CPallini 245
3 Aarti Meswania 173
4 Mahesh Bailwal 171
0 Sergey Alexandrovich Kryukov 9,162
1 OriginalGriff 7,179
2 CPallini 3,913
3 Rohan Leuva 3,176
4 Maciej Los 2,588


Advertise | Privacy | Mobile
Web03 | 2.6.130516.1 | Last Updated 24 Dec 2012
Copyright © CodeProject, 1999-2013
All Rights Reserved. Terms of Use
Layout: fixed | fluid