How can we check the language of text entered in the Textbox in ASP.NET

Question

2.00/5 (1 vote)

See more:

C#

ASP.NET

How can we check the language of text entered in the Textbox in ASP.NET c#

Posted 2-Oct-12 15:46pm

Member 9423565

Add a Solution

Comments

Sergey Alexandrovich Kryukov 2-Oct-12 22:48pm

Doesn't your common sense tell you the answer? it's pretty obvious... :-)
--SA

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Answer 1 · 2012-10-02T16:48:00

This is theoretically impossible, because not all texts can be classified as some certain language (example: just digits, punctuation), too many texts are written in more them one language.

However, lets see if you can get some limited results.

If we set aside very intellectual linguistic methods and the method simply using huge databases of linguistic information — developing of such approaches belongs to computational linguistics and can take a lifetime — you can only determine the language in limited number of situations and for some limited number of languages. For example, if you know in advance that only 15 previously known languages could be used, you can draw one of 17 conclusions: one of 15, or "no specific language" or "failed to determine".

Having made these assumption, what languages are not good for such simple analysis? I assume we are not using dictionaries (which is unreliable, anyway, please see below) and trying to determine the language "of majority of words", by collecting some statistics. We can determine only the Unicode subsets of each characters. All Unicode code points is classified into subset, each for certain applications (such as punctuation), and most of such subsets represent a "writing system".

So, let's see. If a writing system is used by just one language, you can determine the language of a word by the Unicode subset. How good is this method? Right now, I remembered only three languages which could be clearly determined by the writing system: Georgian (still have dialects some consider as different languages) and Armenian. I checked Tamil writing system (Tamil script), but it appeared to be used by some other languages except Tamil. I'm sure people from different countries will give more examples.

Chinese writing system is also used in Korea, Japan and elsewhere; and even in China the languages using the common writing system are considered different. If you want to call the whole writing system a "language", it will be incorrect, but still give you some classification.

The simplest typical case is Cyrillic and Latin. Each of these writing systems is used in different languages, sometime very different. For example, Cyrillic is used by modern Mongolian, a language of Altaic family (including Turkic, Mongolic, Tungusic, and Japonic groups, basically, and Korean), and many Turkic language use either Latin or Cyrillic, as well as Arabo-Persian system. How to go about that?

I don't want to discuss many more complex by fairly usual situations. For example, did you see some artificially "invented" words spelled using a mix of writing systems (usually two, most people don't know more), to create humorous or catchy (advertising) effect?

And finally, let's imagine that you have access to all dictionaries for all world languages; and the dictionaries will work infinitely fast. Will it work? Not so simple. I already mentioned that many texts are mixed language. For example, take the speech of non-English-speaking software developers. Another example: many Ukrainians and people from southern Russian regions use the dialects mixing many Ukrainian and Russian words. Besides, there are many, many words spelled in the same writing system, totally identical in spelling, but belonging to different languages. Moreover, they may have completely different meanings in these languages — there are numerous jokes around funny coincidences in different languages.

As a conclusion, you should admit that the general problem is prohibitively difficult, and not really by some technical reason. The notion of the "language of the text" itself is not really correct; even though it could be applied to a relatively limited set of cases.

—SA