Click here to Skip to main content
15,890,123 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Ι have many text files with total space 4 GB. In Russian, Greek and English languages.

Is there a way - program - software to find the most common - frequent words in these files?

I want it to produce a list ordered from most to least used words.

I know only C and Matlab. Thanks in advance.
Posted
Comments
Richard MacCutchan 22-Dec-13 10:09am    
Then you would need to write a program to read the files, split each line into words, and build some frequency tables.

1 solution

The keyword you need to search for is "concordance". What you require is a variation on that idea. The link below shows a very simple C++ concordance generator which you may be able to rework in C. Failing that I'm sure you'll be able to turn up an example in C which you can adapt if you poke around on teh interwebs.


http://www.cse.lehigh.edu/~glennb/oose/concord.cc[^]
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900