Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# ASP.NET
I need to find the languages of an string in my stand alone program
i don't want use bing translate or google translate .
thanks.
 
in fact i tried this but i cannot do it for all languages, can i?
 
  public string FindLang(string text)
        {
            string result = "";
            if (text.Any(c => c >= 0xFB50 && c <= 0xFEFC))
            {
                result += "Arabic";
            }
            if (text.Any(c => c >= 0x0600 && c <= 0x06FF))
            {
                result += "Persian";
            }
            if (text.Any(c => c >= 0x20 && c <= 0x7E))
            {
                result += "English";
            }
            if (text.Any(c => c >= 0x0530 && c <= 0x058F))
            {
                result += "Armenian";
             }
            if (text.Any(c => c >= 0x2000 && c <= 0xFA2D))
            {
                result += "Chinese";
            }
return result;
Posted 23-Dec-12 9:57am
Edited 23-Dec-12 14:43pm
v2
Comments
Zoltán Zörgő at 23-Dec-12 16:36pm
   
Fairly: forget it.
austinbox at 23-Dec-12 17:18pm
   
Try checking in the string for unique characters from each language, then go from there.
Christ_88 at 23-Dec-12 20:44pm
   
the code which i added exactly do what you said, am i right ?
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

If your string is big enough you may attempt a heuristic approach: build a dictionary with most frequent terms for all the languages your application supports and then find 'the best match' on the give string.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

First of all, just by having in the text some specific characters from unicode chart, does not mean, that we have identified the language. This might be an extra evaluation, but can not be the only one. And one thing is to estimate if the text is from a short list of languages (let's say, up to 10) - and an other thing is to tell from any string if it is from any language. As lower the word count and as wider the language spread is, the less accuracy you will have. And take into account that the languages are not a disjoint sets neither from word set, nor from character set point of view. Estimates vary between around 6,000 and 7,000 languages in number (Wikipedia[^]).
 
I understand, that you want a solution, but what you specified is a really hard signal processing task. If you read this article carefully (Detect a written text's language[^]), you will get a good starting point (you can even use it directly), but be you will also notify, that what you specified is most likely not possible - and I don't know what you exactly need, only what you wrote.
 
Good luck.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
  if (!result.error) {
    var language = 'unknown';
    for (l in google.language.Languages) {
      if (google.language.Languages[l] == result.language) {
        language = l;
        break;
      }
    }
    var container = document.getElementById("detection");
    container.innerHTML = text + " is: " + language + "";
  }
});
Reference : stackoverflow.com[^]
  Permalink  
Comments
Christ_88 at 23-Dec-12 23:41pm
   
thank for the answer,
i already saw these code but my program is gonna be stand alone , and no connection with internet.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



Advertise | Privacy | Mobile
Web03 | 2.8.141022.2 | Last Updated 24 Dec 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100