First of all, just by having in the text some specific characters from unicode chart, does not mean, that we have identified the language. This might be an extra evaluation, but can not be the only one. And one thing is to estimate if the text is from a short list of languages (let's say, up to 10) - and an other thing is to tell from any string if it is from any language. As lower the word count and as wider the language spread is, the less accuracy you will have. And take into account that the languages are not a disjoint sets neither from word set, nor from character set point of view. Estimates vary between around 6,000 and 7,000 languages in number (
Wikipedia[
^]).
I understand, that you want a solution, but what you specified is a really hard signal processing task. If you read this article carefully (
Detect a written text's language[
^]), you will get a good starting point (you can even use it directly), but be you will also notify, that what you specified is most likely not possible - and I don't know what you exactly
need, only what you
wrote.
Good luck.