Click here to Skip to main content
Click here to Skip to main content

Tagged as

Go to top

Ternary Search Tree: Algorithm for word list storage in C#

, 9 Jun 2010
Rate this:
Please Sign up or sign in to vote.
An optimized algorithm for a spell-checker-like database.

Binary Trees in General

Binary trees are used to quickly sort and search large amounts of data. Each tree's leaf has two child leaves (typically called left leaf and right leaf), a search-key, as well as data to be stored. And this database format is adequate for many search applications. Since in the case of text searches, such as a spell-checker's dictionary, the search-key of each leaf in a binary-tree contains a complete word, tests done at each step of the search must compare entire strings of various lengths. These string comparisons are relatively slow and the size of the entire database can be better optimized.

Ternary Search Tree

I learned about binary-trees in a university computer-science technical elective called Algorithms, and after seeing something similar in another article, I invented the Trinary Tree. And I thought I was a genius. But then that's what I thought the first time I realized the word 'Froot' written on a box of cereal was misspelled. No matter. Apparently, this algorithm is old news, and though I can safely pat myself on the back for conjuring it up independently, it may have been something we skimmed over in that Algorithms class oh so long ago. In any case, the real name of this algorithm is Ternary Search Tree. Essentially, it separates the words to be inserted into characters. Each leaf's search-key is a single character, which can either be less than (redirects search through child node called up), greater than (redirects search through child node called down), or equal to the front character of the word to be searched. The word's depth is handled using a third child-node called next.

Reading the article Spell Checker Dictionary Engine, I got the idea of adding a second alt pointer to each leaf in the algorithm that article describes and calling the two child leaves up and down. In this way, the search through the tree can more directly reach the word being sought by sorting the characters of the words in the tree using a binary-tree type sub tree, instead of traversing a list of alt nodes. The search and insertion of a word into the database is done using iteration. By first adding a null character (\0) to the word to be inserted, the tree's root node is sent by reference to the insertion function along with the complete new word. Once in the insertion function, the first character is compared with the current leaf (the leaf which was received as a reference parameter), then depending on this comparison's results, the search either continues along the next, up, or down leaves, chopping off the left-most character of the input search word as it goes along, until it either reaches a null character, or it reaches the end of the input word, indicating the word was found.

The source code published here includes a form with an input text-box, add-button, search textbox, search button, and a feedback label. When you press the Add button, all the words in the input textbox are inserted into the search tree; then you can search the database you've created dynamically by typing text into the search textbox and pressing the search button.

Trinary_Tree.PNG

Using the Code

The class classLeaf has three pointers up, down, and next, and a single character search key chr. If required, a pointer to a data type can be added to the class and read in cases where the end of a word has been reached and the chr field holds a null character.

public class classLeaf 
{ 
  public classLeaf down; 
  public classLeaf up; 
  public classLeaf next; 
  public char chr; 

  public classLeaf(char c) 
  { 
    chr = c; 
  } 
}

All the work is done in the three functions:

private void insertWord(ref classLeaf cLeaf, string strWord)
{
  // test first char of word against this leaf
  if (strWord[0] > cLeaf.chr)
  { // first char belongs below this one
    if (cLeaf.down != null)
      insertWord(ref cLeaf.down, strWord);
      // iterate through tree with current word
    else
    { // there is no leaf going down -> create new leaf 
      classLeaf cDown = new classLeaf(strWord[0]);
      // set this leaf's down field to new leaf
      cLeaf.down = cDown;
      // if there are more letters in word,
      // iterate through tree with current word
      if (strWord.Length > 1) 
        insertWord(ref cDown, strWord);
    }
  }
  else if (strWord[0] < cLeaf.chr)
  { // first char belongs above this leaf
    if (cLeaf.up != null)
      insertWord(ref cLeaf.up, strWord);
      // iterate through tree with current word
    else
    { // there is no leaf going up -> create new leaf
      classLeaf cUp = new classLeaf(strWord[0]);
      // set this leaf's up field to new leaf
      cLeaf.up = cUp;
      // if there are more letters in word,
      // iterate through tree with current word
      if (strWord.Length > 1)
        insertWord(ref cUp, strWord);
    }
  }
  else if (strWord[0] == cLeaf.chr)
  { // first char is identical to this leaf's chr field
    if (strWord.Length > 1)
    { // the word insertion is not complete
      if (cLeaf.next != null)
      { 
        insertWord(ref cLeaf.next, strWord.Substring(1));
        // iterate through tree with chopped word
      }
      else
      { // there is no leaf going next -> create new leaf
        classLeaf cNext = new classLeaf(strWord[1]);
        // set this leaf's next field to new leaf
        cLeaf.next = cNext;
        // iterate through tree with current word
        insertWord(ref cNext, strWord.Substring(1));
        // iterate through tree with chopped word
      }
    }
  }
}

which inserts a word into the database, given a reference to the tree's root and the complete word (appended by the null char \0).

There's also the search function:

private bool wordInTree(ref classLeaf cLeaf, string strWord)
{
  // test first char of word against input Leaf's char field
  if (strWord[0] > cLeaf.chr)
  { // first char belongs below this leaf
    if (cLeaf.down != null)
      return wordInTree(ref cLeaf.down, strWord);
    else
      return false;
  }
  else if (strWord[0] < cLeaf.chr)
  { // first char belongs above this leaf
    if (cLeaf.up != null)
      return wordInTree(ref cLeaf.up, strWord);
    else
      return false;
  }
  else if (strWord[0] == cLeaf.chr)
  { // first char matches this leaf
    if (strWord.Length == 1) // last char in input word
      return true; 
    else // iterate along 'next' leaf for second char
      return wordInTree(ref cLeaf.next, strWord.Substring(1));
  }

 MessageBox.Show("this should not happen");
 return false;
}

which searches the database and returns true or false depending on whether or not the word is found.

The tree traversal is done in a way similar to a binary tree's traversal (e.g., left - me - right), except it needs to keep track of the current word it's following through every iteration. Here's the code:

void traverseTree(ref classLeaf cLeaf, ref string[] strWords, string strThisWord)
{
  if (cLeaf.up != null)
    traverseTree(ref cLeaf.up, ref strWords, strThisWord);

  string strExtendedWord = strThisWord + cLeaf.chr.ToString();
  if (cLeaf.chr == chrNull)
  {
    Array.Resize<string>(ref strWords, strWords.Length + 1);
    strWords[strWords.Length - 1] = strExtendedWord;
  }
  else
    if (cLeaf.next != null)
      traverseTree(ref cLeaf.next, ref strWords, strExtendedWord);

  if (cLeaf.down != null)
    traverseTree(ref cLeaf.down, ref strWords, strThisWord);
}

The output is stored in the array of strings, strWords, which is received and passed on as a reference variable, and the traversal sequence is up, next, then down. Notice that the strExtendedWord string created in this function adds this node's char value to the end of the running parameter string strThisWord before the word is stored in the resized array. It is also the same running word that is passed on to the next iteration.

If we were to insert the words boat + boats + bats + bat in that order into an empty tree, the data would resemble this:

Trinary_Tree_data_sample.jpg

Because the tree needs to be initialized with a null-char, and the null-char is considered less than the letters of the alphabet, the first word's first letter is inserted into the root node's down leaf and then the word proceeds along each succeeding leaf's next leaves until the null-char ends the first insertion. Then, when the second word boats is inserted, the algorithm reuses the first four nodes and reaches the first word's null-char, where it adds the s and the final null-char of the second word boats to the first word's terminating null char's down leaf. The next insertion reaches the second letter of the first word before it has to veer up and add the ats of the third word bats. Finally, the last word chops the s off bats by moving up and adding a null-char there, effectively terminating the fourth word bat.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Christ Kennedy
CEO unemployable
Canada Canada
Christ Kennedy, published his fourth novel "Cleats of the Counter Revolution" in the summer of 2010. He grew up in the suburbs of Montreal and is a bilingual Quebecois with a bachelor’s degree in computer engineering from McGill University and is currently walking across ontario plotting a new novel, far away from any computer.

Comments and Discussions

 
QuestionSimilar to 'trie' data structure? PinmemberJohnPool15-Jun-10 22:34 
AnswerRe: Similar to 'trie' data structure? PinmemberChrist Kennedy16-Jun-10 0:58 
GeneralHmmm.. PinmemberCorey Fournier9-Jun-10 2:26 
GeneralRe: Hmmm.. PinmemberChrist Kennedy9-Jun-10 2:57 
GeneralGood article... PinmemberMR_SAM_PIPER8-Jun-10 13:51 
GeneralRe: Good article... PinmemberChrist Kennedy8-Jun-10 15:46 
GeneralRe: Good article... PinmemberJHankins22-Feb-13 14:36 
GeneralGreat article PinmemberMember 439129431-May-10 21:40 
GeneralRe: Great article PinmemberChrist Kennedy1-Jun-10 0:44 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140921.1 | Last Updated 9 Jun 2010
Article Copyright 2010 by Christ Kennedy
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid