12,079,202 members (26,952 online)
alternative version

27.6K views
34 bookmarked
Posted

# Ternary Search Tree: Algorithm for word list storage in C#

, 9 Jun 2010 CPOL
 Rate this:
An optimized algorithm for a spell-checker-like database.

## Binary Trees in General

Binary trees are used to quickly sort and search large amounts of data. Each tree's leaf has two child leaves (typically called left leaf and right leaf), a search-key, as well as data to be stored. And this database format is adequate for many search applications. Since in the case of text searches, such as a spell-checker's dictionary, the search-key of each leaf in a binary-tree contains a complete word, tests done at each step of the search must compare entire strings of various lengths. These string comparisons are relatively slow and the size of the entire database can be better optimized.

## Ternary Search Tree

I learned about binary-trees in a university computer-science technical elective called Algorithms, and after seeing something similar in another article, I invented the Trinary Tree. And I thought I was a genius. But then that's what I thought the first time I realized the word 'Froot' written on a box of cereal was misspelled. No matter. Apparently, this algorithm is old news, and though I can safely pat myself on the back for conjuring it up independently, it may have been something we skimmed over in that Algorithms class oh so long ago. In any case, the real name of this algorithm is Ternary Search Tree. Essentially, it separates the words to be inserted into characters. Each leaf's search-key is a single character, which can either be less than (redirects search through child node called `up`), greater than (redirects search through child node called `down`), or equal to the front character of the word to be searched. The word's depth is handled using a third child-node called `next`.

Reading the article Spell Checker Dictionary Engine, I got the idea of adding a second `alt` pointer to each leaf in the algorithm that article describes and calling the two child leaves `up` and `down`. In this way, the search through the tree can more directly reach the word being sought by sorting the characters of the words in the tree using a binary-tree type sub tree, instead of traversing a list of alt nodes. The search and insertion of a word into the database is done using iteration. By first adding a null character (`\0`) to the word to be inserted, the tree's root node is sent by reference to the insertion function along with the complete new word. Once in the insertion function, the first character is compared with the current leaf (the leaf which was received as a reference parameter), then depending on this comparison's results, the search either continues along the `next`, `up`, or `down` leaves, chopping off the left-most character of the input search word as it goes along, until it either reaches a null character, or it reaches the end of the input word, indicating the word was found.

The source code published here includes a form with an input text-box, add-button, search textbox, search button, and a feedback label. When you press the Add button, all the words in the input textbox are inserted into the search tree; then you can search the database you've created dynamically by typing text into the search textbox and pressing the search button.

## Using the Code

The class `classLeaf` has three pointers `up`, `down`, and `next`, and a single character search key `chr`. If required, a pointer to a data type can be added to the class and read in cases where the end of a word has been reached and the `chr` field holds a null character.

```public class classLeaf
{
public classLeaf down;
public classLeaf up;
public classLeaf next;
public char chr;

public classLeaf(char c)
{
chr = c;
}
}```

All the work is done in the three functions:

```private void insertWord(ref classLeaf cLeaf, string strWord)
{
// test first char of word against this leaf
if (strWord[0] > cLeaf.chr)
{ // first char belongs below this one
if (cLeaf.down != null)
insertWord(ref cLeaf.down, strWord);
// iterate through tree with current word
else
{ // there is no leaf going down -> create new leaf
classLeaf cDown = new classLeaf(strWord[0]);
// set this leaf's down field to new leaf
cLeaf.down = cDown;
// if there are more letters in word,
// iterate through tree with current word
if (strWord.Length > 1)
insertWord(ref cDown, strWord);
}
}
else if (strWord[0] < cLeaf.chr)
{ // first char belongs above this leaf
if (cLeaf.up != null)
insertWord(ref cLeaf.up, strWord);
// iterate through tree with current word
else
{ // there is no leaf going up -> create new leaf
classLeaf cUp = new classLeaf(strWord[0]);
// set this leaf's up field to new leaf
cLeaf.up = cUp;
// if there are more letters in word,
// iterate through tree with current word
if (strWord.Length > 1)
insertWord(ref cUp, strWord);
}
}
else if (strWord[0] == cLeaf.chr)
{ // first char is identical to this leaf's chr field
if (strWord.Length > 1)
{ // the word insertion is not complete
if (cLeaf.next != null)
{
insertWord(ref cLeaf.next, strWord.Substring(1));
// iterate through tree with chopped word
}
else
{ // there is no leaf going next -> create new leaf
classLeaf cNext = new classLeaf(strWord[1]);
// set this leaf's next field to new leaf
cLeaf.next = cNext;
// iterate through tree with current word
insertWord(ref cNext, strWord.Substring(1));
// iterate through tree with chopped word
}
}
}
}```

which inserts a word into the database, given a reference to the tree's root and the complete word (appended by the null char `\0`).

There's also the search function:

```private bool wordInTree(ref classLeaf cLeaf, string strWord)
{
// test first char of word against input Leaf's char field
if (strWord[0] > cLeaf.chr)
{ // first char belongs below this leaf
if (cLeaf.down != null)
return wordInTree(ref cLeaf.down, strWord);
else
return false;
}
else if (strWord[0] < cLeaf.chr)
{ // first char belongs above this leaf
if (cLeaf.up != null)
return wordInTree(ref cLeaf.up, strWord);
else
return false;
}
else if (strWord[0] == cLeaf.chr)
{ // first char matches this leaf
if (strWord.Length == 1) // last char in input word
return true;
else // iterate along 'next' leaf for second char
return wordInTree(ref cLeaf.next, strWord.Substring(1));
}

MessageBox.Show("this should not happen");
return false;
}```

which searches the database and returns `true` or `false` depending on whether or not the word is found.

The tree traversal is done in a way similar to a binary tree's traversal (e.g., left - me - right), except it needs to keep track of the current word it's following through every iteration. Here's the code:

```void traverseTree(ref classLeaf cLeaf, ref string[] strWords, string strThisWord)
{
if (cLeaf.up != null)
traverseTree(ref cLeaf.up, ref strWords, strThisWord);

string strExtendedWord = strThisWord + cLeaf.chr.ToString();
if (cLeaf.chr == chrNull)
{
Array.Resize<string>(ref strWords, strWords.Length + 1);
strWords[strWords.Length - 1] = strExtendedWord;
}
else
if (cLeaf.next != null)
traverseTree(ref cLeaf.next, ref strWords, strExtendedWord);

if (cLeaf.down != null)
traverseTree(ref cLeaf.down, ref strWords, strThisWord);
}```

The output is stored in the array of strings, `strWords`, which is received and passed on as a reference variable, and the traversal sequence is `up`, `next`, then `down`. Notice that the `strExtendedWord` string created in this function adds this node's char value to the end of the running parameter string `strThisWord` before the word is stored in the resized array. It is also the same running word that is passed on to the `next` iteration.

If we were to insert the words boat + boats + bats + bat in that order into an empty tree, the data would resemble this:

Because the tree needs to be initialized with a null-char, and the null-char is considered less than the letters of the alphabet, the first word's first letter is inserted into the root node's `down` leaf and then the word proceeds along each succeeding leaf's `next` leaves until the null-char ends the first insertion. Then, when the second word boats is inserted, the algorithm reuses the first four nodes and reaches the first word's null-char, where it adds the s and the final null-char of the second word boats to the first word's terminating null char's `down` leaf. The next insertion reaches the second letter of the first word before it has to veer `up` and add the ats of the third word bats. Finally, the last word chops the s off bats by moving `up` and adding a null-char there, effectively terminating the fourth word bat.

## Share

Christ Kennedy, published his fourth novel "Cleats of the Counter Revolution" in the summer of 2010. He grew up in the suburbs of Montreal and is a bilingual Quebecois with a bachelor’s degree in computer engineering from McGill University and is currently walking across ontario plotting a new novel, far away from any computer.

## You may also be interested in...

 First Prev Next
 Similar to 'trie' data structure? JohnPool15-Jun-10 23:34 JohnPool 15-Jun-10 23:34
 Re: Similar to 'trie' data structure? Christ Kennedy16-Jun-10 1:58 Christ Kennedy 16-Jun-10 1:58
 Hmmm.. Corey Fournier9-Jun-10 3:26 Corey Fournier 9-Jun-10 3:26
 Re: Hmmm.. Christ Kennedy9-Jun-10 3:57 Christ Kennedy 9-Jun-10 3:57
 Good article... MR_SAM_PIPER8-Jun-10 14:51 MR_SAM_PIPER 8-Jun-10 14:51
 Re: Good article... Christ Kennedy8-Jun-10 16:46 Christ Kennedy 8-Jun-10 16:46
 Re: Good article... JHankins22-Feb-13 15:36 JHankins 22-Feb-13 15:36
 Great article Member 439129431-May-10 22:40 Member 4391294 31-May-10 22:40
 Re: Great article Christ Kennedy1-Jun-10 1:44 Christ Kennedy 1-Jun-10 1:44
 Last Visit: 31-Dec-99 19:00     Last Update: 14-Feb-16 3:20 Refresh 1