Click here to Skip to main content
13,090,178 members (49,784 online)
Rate this:
Please Sign up or sign in to vote.
See more:
If we have a sentence which is stored in Trie and trying to find a pattern in that. Assuming i am storing text as an array. What if i have repeated words in that sentence. What is correct way of storing it.
Should i keep storing frequency at each leaf node?
What is best way of finding no of occurrence of word in a text?
Posted 18-Mar-13 7:16am
Sergey Alexandrovich Kryukov 18-Mar-13 14:32pm
Why "text as an array"? Of do you mean array of char, a string? The problem needs some considerable work to do it in good performance from scratch.

1 solution

Rate this: bad
Please Sign up or sign in to vote.

Solution 1

Please see my comment to the question.

A solution with a good performance is possible via a hash table:[^].

This structure keeps data in key-value pairs, supports uniqueness of keys, and allows to quickly find a value by a key. The computational time complexity of search is O(1), and is highly beneficial starting some considerable volume of data. (Please see[^].)

You would need to have keys of the string type, to represent a word, and a value should represent a number of words found so far (use pointer to a number). You add words one-by one (first, you would need to split your string into words, another problem to solve), and try to add each word. If a value is found by some key, increment the word count by pointer. If not found, add a new key-value pair with value of 1 (a pointer to a value object). At the end, you will have all words each represented only once, and number of occurrences for each word. Don't forget to deallocate all the memory.

Perhaps, much simple structure with acceptable time complexity (O(log(n)) would be a binary tree:[^].

The idea of using binary tree is pretty much the same.

With C, it's going to be a good volume of work, but it promises you unforgettable and useful experience, especially in patience. :-)
You can help yourself if you find existing C implementation of a tree or a hash table, which should be easy.

Good luck,

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month

Advertise | Privacy |
Web03 | 2.8.170813.1 | Last Updated 18 Mar 2013
Copyright © CodeProject, 1999-2017
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100