How Find The String Language in C#

Question

5.00/5 (1 vote)

See more:

I need to find the languages of an string in my stand alone program
i don't want use bing translate or google translate .
thanks.

in fact i tried this but i cannot do it for all languages, can i?

C#

  public string FindLang(string text)
        {
            string result = "";
            if (text.Any(c => c >= 0xFB50 && c <= 0xFEFC))
            {
                result += "Arabic";
            }
            if (text.Any(c => c >= 0x0600 && c <= 0x06FF))
            {
                result += "Persian";
            }
            if (text.Any(c => c >= 0x20 && c <= 0x7E))
            {
                result += "English";
            }
            if (text.Any(c => c >= 0x0530 && c <= 0x058F))
            {
                result += "Armenian";
             }
            if (text.Any(c => c >= 0x2000 && c <= 0xFA2D))
            {
                result += "Chinese";
            }
return result;

Posted 23-Dec-12 9:57am

Christ_88

Updated 23-Dec-12 14:43pm

v2

Add a Solution

Comments

Zoltán Zörgő 23-Dec-12 16:36pm

Fairly: forget it.

austinbox 23-Dec-12 17:18pm

Try checking in the string for unique characters from each language, then go from there.

Christ_88 23-Dec-12 20:44pm

the code which i added exactly do what you said, am i right ?

3 solutions

Solution 2

C#

var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
  if (!result.error) {
    var language = 'unknown';
    for (l in google.language.Languages) {
      if (google.language.Languages[l] == result.language) {
        language = l;
        break;
      }
    }
    var container = document.getElementById("detection");
    container.innerHTML = text + " is: " + language + "";
  }
});

Reference : stackoverflow.com[^]

Posted 23-Dec-12 17:32pm

Krunal Rohit

Comments

Christ_88 23-Dec-12 23:41pm

thank for the answer,
i already saw these code but my program is gonna be stand alone , and no connection with internet.

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

**Zoltán Zörgő** · Accepted Answer · 2012-12-24T08:51:00

First of all, just by having in the text some specific characters from unicode chart, does not mean, that we have identified the language. This might be an extra evaluation, but can not be the only one. And one thing is to estimate if the text is from a short list of languages (let's say, up to 10) - and an other thing is to tell from any string if it is from any language. As lower the word count and as wider the language spread is, the less accuracy you will have. And take into account that the languages are not a disjoint sets neither from word set, nor from character set point of view. Estimates vary between around 6,000 and 7,000 languages in number (Wikipedia[^]).

I understand, that you want a solution, but what you specified is a really hard signal processing task. If you read this article carefully (Detect a written text's language[^]), you will get a good starting point (you can even use it directly), but be you will also notify, that what you specified is most likely not possible - and I don't know what you exactly need, only what you wrote.

Good luck.

CPallini · Accepted Answer · 2012-12-23T11:38:00

Solution 1

If your string is big enough you may attempt a heuristic approach: build a dictionary with most frequent terms for all the languages your application supports and then find 'the best match' on the give string.

Posted 23-Dec-12 11:38am

CPallini