Click here to Skip to main content
15,890,897 members
Please Sign up or sign in to vote.
1.00/5 (3 votes)
See more:
Hi
Text is English language in doc, Text, PDF, xls files identify using C#

please give me a sample code.
Posted
Comments
walterhevedeich 23-Aug-11 0:36am    
Your question is not clear. Try to elaborate further what you really want.
mottudeepu 23-Aug-11 3:41am    
i think.. he is asking that "how to fetch text(which is in English language) from .doc,text,pdf,xls files..?"
hemantwithu 23-Aug-11 7:42am    
Actually i need to Judge the language of the Text retrived.

1 solution

Assuming you already know how to extract the text from all these file types. You need to analyse the text and then compare all the words to their counterparts in every known language to see if they exist. When you have tested every word and more than some percentage (say 95%) are only English then you can be reasonably confident that all the text is English.

As you can see this is not a trivial task.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900