English Dictionary

mukhopadhyay somenath

4.11/5 (11 votes)

Apr 8, 2006

CPOL

4 min read

70046

6347

The implementation of an English Dictionary using Ternary Search Trees

Introduction

In this article, I have tried to implement an English Dictionary application using a Ternary Search Tree through a MFC dialog based application, which has an input field and a list of words. It does prefix matching and thus filters out the unmatched words from the list as we type in the input text field. It also does the neighbor search of a particular word, i.e. it gives out the list of near words which closely match the word we have typed in.

For example, in the dictionary application we have a word say “bat”. Now if we type in “bat” and click on the button “More Words”, it will give a list of words like “bat”, “mad”, “mat”, “rat”, “sad” and “sat”, etc. It is understandable that these words should present in the tree structure.

We have another output field called “Meaning”, which will show the meaning of a word typed in the input box.

Explanation of the Code

There are two main classes in the application which have implemented the Dictionary application. These two classes are CTernarySearchTree and CTSTNode.

The Dialog class actually owns the ternary search tree class which in turn uses the Node class.

Let’s delve into the Dialog class. It has got few main functions like OnButton1, OnChangeEdit1, OnButton2 and OnButton3. Actually when this project was initially implemented, I kept a button (Button1) and on clicking that button I loaded the tree with data. But now I have hidden that button and instead call that function (OnButton1) inside OnInitDialog. The function OnButton2 is responsible for displaying the meaning. OnChangeEdit1 is responsible for the prefix matching as we type in. And OnButton3 is the function which shows the words after doing near search.

This is all about the functionality explanation of the Dictionary application’s front end. The main logic of this application lies into the classes CTSTNode and CTernarySearchTree. Let’s discuss these two classes.

CTSTNode is the class which represents each node in the tree structure. As the tree is a ternary search tree, each of the nodes of the tree has got three subtrees. These are referred to as LOKID, EQKID and HIKID. At the same time, it has got a reference to the original string (which is to be loaded from a text file) as well as its meaning which is also loaded from a text file. The CTSTNode has another character variable called cSplitChar.

class CTSTNode
{
public:
          friend class CTernarySearchTree;
          friend class CEnglishDictionaryDlg;
          CTSTNode();
          CTSTNode(CTSTNode* p, char* SplitChar)
          {
                   cSplitChar = *SplitChar;
                   EQKID = p;
                   originalstring = NULL;
                   meaning = NULL;
          };
          virtual ~CTSTNode();
 
private:
          //enum ID {PARENT=0, LOKID, EQKID, HIKID};
          char cSplitChar;
          CTSTNode *LOKID, *HIKID, *EQKID, *PARENT;
          char* originalstring;
          char* meaning;
};

Fig: Class declaration of CTSTNode

While inserting the data into the tree structure, the logic takes one character (say SplitChar) from the string (which is to be loaded) and compares it with the current node’s cSplitChar. If alphabetically SplitChar comes before cSplitChar of the current node, the logic will place it in the LOKID node of the current node. If the SplitChar comes after cSplitChar of the current node, the logic will place it in the HIKID node of the current node. And if the SplitChar is equal to the cSplitChar of the current node, the logic will place it in the EQKID of the current node. And it will repeat the whole process this way. This logic can be seen in the “Insert” function of the CTernarySearchTree class which is given below:

if (*SplitChar != '\0')
{
      no_of_recursion++;
      if(nodeptr == NULL) 
      {
           nodeptr = new CTSTNode(nodeptr, SplitChar);
           nodeptr->LOKID = nodeptr->HIKID = nodeptr->EQKID = NULL;
      }
                             
      if(*SplitChar < nodeptr->cSplitChar) 
      {
           nodeptr->LOKID = Insert(nodeptr->LOKID,SplitChar, meaning);
      }

      else if (*SplitChar == nodeptr->cSplitChar) 
      {
           nodeptr->EQKID = Insert(nodeptr->EQKID, ++SplitChar, meaning);
      }
      else 
      {
           nodeptr->HIKID = Insert(nodeptr->HIKID, SplitChar, meaning);
      }
}

Fig: Snippet from Insert function

If we study the Insert function a little more thoroughly, we will be able to understand that once the end of an word (which is being inserted in the tree) is reached , i.e. the ‘\0’ char is reached, (i.e. when the variable lastnodeinitialized becomes true), it will store two references, one for the word itself, and the other for its meaning, inside that node. This can be seen from the code below:

if (*SplitChar == '\0' && no_of_recursion) 
{ 
     lastnodeinitialized = TRUE; 
     no_of_recursion--; 
} 
            
if(lastnodeinitialized && nodeptr) 
{ 
     nodeptr->originalstring = originalstring; 
     nodeptr->meaning = meaning; 
     lastnodeinitialized = FALSE; 
     no_of_recursion = 0; 
     originalstring = NULL; 
}

Fig: Snippet from Insert function

The CTernarySearchTree class has other member functions like NearSerch, Partialmatch, Search, Traverse and Traverse_And_Match.

Of these, the function NearSearch does a neighbour search of a particular string within a certain Hamming distance. We can do it by typing the word “Bat” and by clicking “More Words” button. In the application, we are doing near search within distance 2 as is obvious from the following line of code:

void CEnglishDictionaryDlg::OnButton3()
 {
 ……
 test->NearSearch(root,str.GetBuffer(str.GetLength()),2); 
 ……
 }

This same function can be used for spell checking.

The function Traverse traverses the whole tree and fills the main list box which shows all the words.

void CTernarySearchTree::Traverse(CTSTNode* nodeptr)
 {
            if (!nodeptr) return; 
                      Traverse(nodeptr->LOKID);
            if (nodeptr->cSplitChar) 
            {
                    Traverse(nodeptr->EQKID);
            }
            if(nodeptr->originalstring)
            {
                     //AfxMessageBox(nodeptr->originalstring);
                     strList.AddHead(CString(nodeptr->originalstring));
            }   
            Traverse(nodeptr->HIKID);  
 }

Fig: The Traverse function.

PartialMatch is the function which is responsible for the filling up of the partially matched words as we type in.

void CTernarySearchTree :: PartialMatch(CTSTNode* nodeptr, char* String)
 {
           CTSTNode* Found_At = Search(nodeptr , String);
           CTSTNode* currentnode = Found_At;
           if(!Found_At) return;
           
           Traverse_And_Match(Found_At,String);
           
 }

Fig: The PartialMatch function.

Conclusion

This kind of application can be used for developing any dictionary application for mobile phones. The nearsearch algorithm can be used for spell checking. The partialmatch functionality can be used for developing a phone book in a mobile device.

Reference

The article “Ternary Search Trees” by Jon Bentley and Bob Sedgewick that appeared in Dr. Dobb’s Journal.

History

8^th April, 2006: Initial post