12,402,189 members (71,276 online)
Rate this:
See more:
If we have a sentence which is stored in Trie and trying to find a pattern in that. Assuming i am storing text as an array. What if i have repeated words in that sentence. What is correct way of storing it.
Should i keep storing frequency at each leaf node?
What is best way of finding no of occurrence of word in a text?
Posted 18-Mar-13 7:16am
Sergey Alexandrovich Kryukov 18-Mar-13 14:32pm

Why "text as an array"? Of do you mean array of char, a string? The problem needs some considerable work to do it in good performance from scratch.
—SA

Rate this:

## Solution 1

Please see my comment to the question.

A solution with a good performance is possible via a hash table: http://en.wikipedia.org/wiki/Hash_table[^].

This structure keeps data in key-value pairs, supports uniqueness of keys, and allows to quickly find a value by a key. The computational time complexity of search is O(1), and is highly beneficial starting some considerable volume of data. (Please see http://en.wikipedia.org/wiki/Big_O_notation[^].)

You would need to have keys of the string type, to represent a word, and a value should represent a number of words found so far (use pointer to a number). You add words one-by one (first, you would need to split your string into words, another problem to solve), and try to add each word. If a value is found by some key, increment the word count by pointer. If not found, add a new key-value pair with value of 1 (a pointer to a value object). At the end, you will have all words each represented only once, and number of occurrences for each word. Don't forget to deallocate all the memory.

Perhaps, much simple structure with acceptable time complexity (O(log(n)) would be a binary tree: http://en.wikipedia.org/wiki/Binary_tree[^].

The idea of using binary tree is pretty much the same.

With C, it's going to be a good volume of work, but it promises you unforgettable and useful experience, especially in patience.
You can help yourself if you find existing C implementation of a tree or a hash table, which should be easy.

Good luck,
—SA

Top Experts
Last 24hrsThis month
 OriginalGriff 285 Richard Deeming 198 Karthik Bangalore 180 F-ES Sitecore 135 Tomas Takac 100
 OriginalGriff 7,073 ppolymorphe 3,030 Karthik Bangalore 2,917 F-ES Sitecore 2,187 Richard MacCutchan 2,125