Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C pattern
If we have a sentence which is stored in Trie and trying to find a pattern in that. Assuming i am storing text as an array. What if i have repeated words in that sentence. What is correct way of storing it.
Should i keep storing frequency at each leaf node?
What is best way of finding no of occurrence of word in a text?
Posted 18-Mar-13 8:16am
Comments
Sergey Alexandrovich Kryukov at 18-Mar-13 14:32pm
   
Why "text as an array"? Of do you mean array of char, a string? The problem needs some considerable work to do it in good performance from scratch.
—SA

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Please see my comment to the question.
 
A solution with a good performance is possible via a hash table: http://en.wikipedia.org/wiki/Hash_table[^].
 
This structure keeps data in key-value pairs, supports uniqueness of keys, and allows to quickly find a value by a key. The computational time complexity of search is O(1), and is highly beneficial starting some considerable volume of data. (Please see http://en.wikipedia.org/wiki/Big_O_notation[^].)
 
You would need to have keys of the string type, to represent a word, and a value should represent a number of words found so far (use pointer to a number). You add words one-by one (first, you would need to split your string into words, another problem to solve), and try to add each word. If a value is found by some key, increment the word count by pointer. If not found, add a new key-value pair with value of 1 (a pointer to a value object). At the end, you will have all words each represented only once, and number of occurrences for each word. Don't forget to deallocate all the memory.
 
Perhaps, much simple structure with acceptable time complexity (O(log(n)) would be a binary tree: http://en.wikipedia.org/wiki/Binary_tree[^].
 
The idea of using binary tree is pretty much the same.
 
With C, it's going to be a good volume of work, but it promises you unforgettable and useful experience, especially in patience. Smile | :)
You can help yourself if you find existing C implementation of a tree or a hash table, which should be easy.
 
Good luck,
—SA
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 Sergey Alexandrovich Kryukov 609
1 OriginalGriff 587
2 Maciej Los 325
3 Mathew Soji 195
4 BillWoodruff 190
0 OriginalGriff 7,356
1 Sergey Alexandrovich Kryukov 6,712
2 DamithSL 5,461
3 Manas Bhardwaj 4,916
4 Maciej Los 4,475


Advertise | Privacy | Mobile
Web03 | 2.8.1411023.1 | Last Updated 18 Mar 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100