Click here to Skip to main content
15,867,568 members
Articles / Desktop Programming / MFC

English Dictionary

Rate me:
Please Sign up or sign in to vote.
4.11/5 (11 votes)
8 Apr 2006CPOL4 min read 69.1K   6.3K   23   3
The implementation of an English Dictionary using Ternary Search Trees

Introduction

In this article, I have tried to implement an English Dictionary application using a Ternary Search Tree through a MFC dialog based application, which has an input field and a list of words. It does prefix matching and thus filters out the unmatched words from the list as we type in the input text field. It also does the neighbor search of a particular word, i.e. it gives out the list of near words which closely match the word we have typed in.

For example, in the dictionary application we have a word say “bat”. Now if we type in “bat” and click on the button “More Words”, it will give a list of words like “bat”, “mad”, “mat”, “rat”, “sad” and “sat”, etc. It is understandable that these words should present in the tree structure.

We have another output field called “Meaning”, which will show the meaning of a word typed in the input box.

Explanation of the Code

There are two main classes in the application which have implemented the Dictionary application. These two classes are CTernarySearchTree and CTSTNode.

The Dialog class actually owns the ternary search tree class which in turn uses the Node class.

Let’s delve into the Dialog class. It has got few main functions like OnButton1, OnChangeEdit1, OnButton2 and OnButton3. Actually when this project was initially implemented, I kept a button (Button1) and on clicking that button I loaded the tree with data. But now I have hidden that button and instead call that function (OnButton1) inside OnInitDialog. The function OnButton2 is responsible for displaying the meaning. OnChangeEdit1 is responsible for the prefix matching as we type in. And OnButton3 is the function which shows the words after doing near search.

This is all about the functionality explanation of the Dictionary application’s front end. The main logic of this application lies into the classes CTSTNode and CTernarySearchTree. Let’s discuss these two classes.

CTSTNode is the class which represents each node in the tree structure. As the tree is a ternary search tree, each of the nodes of the tree has got three subtrees. These are referred to as LOKID, EQKID and HIKID. At the same time, it has got a reference to the original string (which is to be loaded from a text file) as well as its meaning which is also loaded from a text file. The CTSTNode has another character variable called cSplitChar.

C++
class CTSTNode
{
public:
          friend class CTernarySearchTree;
          friend class CEnglishDictionaryDlg;
          CTSTNode();
          CTSTNode(CTSTNode* p, char* SplitChar)
          {
                   cSplitChar = *SplitChar;
                   EQKID = p;
                   originalstring = NULL;
                   meaning = NULL;
          };
          virtual ~CTSTNode();
 
private:
          //enum ID {PARENT=0, LOKID, EQKID, HIKID};
          char cSplitChar;
          CTSTNode *LOKID, *HIKID, *EQKID, *PARENT;
          char* originalstring;
          char* meaning;
};
Fig: Class declaration of CTSTNode

While inserting the data into the tree structure, the logic takes one character (say SplitChar) from the string (which is to be loaded) and compares it with the current node’s cSplitChar. If alphabetically SplitChar comes before cSplitChar of the current node, the logic will place it in the LOKID node of the current node. If the SplitChar comes after cSplitChar of the current node, the logic will place it in the HIKID node of the current node. And if the SplitChar is equal to the cSplitChar of the current node, the logic will place it in the EQKID of the current node. And it will repeat the whole process this way. This logic can be seen in the “Insert” function of the CTernarySearchTree class which is given below:

C++
if (*SplitChar != '\0')
{
      no_of_recursion++;
      if(nodeptr == NULL) 
      {
           nodeptr = new CTSTNode(nodeptr, SplitChar);
           nodeptr->LOKID = nodeptr->HIKID = nodeptr->EQKID = NULL;
      }
                             
      if(*SplitChar < nodeptr->cSplitChar) 
      {
           nodeptr->LOKID = Insert(nodeptr->LOKID,SplitChar, meaning);
      }

      else if (*SplitChar == nodeptr->cSplitChar) 
      {
           nodeptr->EQKID = Insert(nodeptr->EQKID, ++SplitChar, meaning);
      }
      else 
      {
           nodeptr->HIKID = Insert(nodeptr->HIKID, SplitChar, meaning);
      }
}
Fig: Snippet from Insert function

If we study the Insert function a little more thoroughly, we will be able to understand that once the end of an word (which is being inserted in the tree) is reached , i.e. the ‘\0’ char is reached, (i.e. when the variable lastnodeinitialized becomes true), it will store two references, one for the word itself, and the other for its meaning, inside that node. This can be seen from the code below:

C++
if (*SplitChar == '\0' && no_of_recursion) 
{ 
     lastnodeinitialized = TRUE; 
     no_of_recursion--; 
} 
            
if(lastnodeinitialized && nodeptr) 
{ 
     nodeptr->originalstring = originalstring; 
     nodeptr->meaning = meaning; 
     lastnodeinitialized = FALSE; 
     no_of_recursion = 0; 
     originalstring = NULL; 
}
Fig: Snippet from Insert function

The CTernarySearchTree class has other member functions like NearSerch, Partialmatch, Search, Traverse and Traverse_And_Match.

Of these, the function NearSearch does a neighbour search of a particular string within a certain Hamming distance. We can do it by typing the word “Bat” and by clicking “More Words” button. In the application, we are doing near search within distance 2 as is obvious from the following line of code:

C++
void CEnglishDictionaryDlg::OnButton3()
 {
 ……
 test->NearSearch(root,str.GetBuffer(str.GetLength()),2); 
 ……
 }

This same function can be used for spell checking.

The function Traverse traverses the whole tree and fills the main list box which shows all the words.

C++
void CTernarySearchTree::Traverse(CTSTNode* nodeptr)
 {
            if (!nodeptr) return; 
                      Traverse(nodeptr->LOKID);
            if (nodeptr->cSplitChar) 
            {
                    Traverse(nodeptr->EQKID);
            }
            if(nodeptr->originalstring)
            {
                     //AfxMessageBox(nodeptr->originalstring);
                     strList.AddHead(CString(nodeptr->originalstring));
            }   
            Traverse(nodeptr->HIKID);  
 }
Fig: The Traverse function.

PartialMatch is the function which is responsible for the filling up of the partially matched words as we type in.

C++
void CTernarySearchTree :: PartialMatch(CTSTNode* nodeptr, char* String)
 {
           CTSTNode* Found_At = Search(nodeptr , String);
           CTSTNode* currentnode = Found_At;
           if(!Found_At) return;
           
           Traverse_And_Match(Found_At,String);
           
 }
Fig: The PartialMatch function.

Conclusion

This kind of application can be used for developing any dictionary application for mobile phones. The nearsearch algorithm can be used for spell checking. The partialmatch functionality can be used for developing a phone book in a mobile device.

Reference

  • The article “Ternary Search Trees” by Jon Bentley and Bob Sedgewick that appeared in Dr. Dobb’s Journal.

History

  • 8th April, 2006: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect som-itsolutions
India India
The best way I can describe myself is as a dream chaser. In the beginning of my career, being in the marketing department of a big telecom company, which hardly added any values to my curiosity, I was hell-bent to jump into the software because that was the only way to know about the nitty-gritty of the hardcore technical aspects. Hence I started with learning C++/VC++. But in the beginning it was really difficult without much idea about programming. Moreover, there was no google. I took a little more time to pick up. There was no training. Absolutely no help from anybody. No broadband internet. No computer at home. It was really difficult for me. But I did not stop dreaming. I used to dream and tell my colleagues that C++ is not as much about programming as about designing. It is more about a technique for moving from the problem domain to the solution domain. However, I hardly got any supports from the organizations where I worked. It was only when I got a PC at home, I started walking towards my goal. The early morning rise, innumerable visits to technical book stores in Bangalore, googling and traversing from one link to another in search for technical and C++ contents, becoming tired after the office hours, all were part of it. But still the road was difficult. I was not able to join the dots. And then when I started going through the Design Pattern book, the actual joy of learning began. Still I remember how I used to go through the MFC source code to map different GoF patterns in Doc-View architecture, the command-routing architecture and so forth. However, I was not much aware of the Open Source communities. Then when Google made their Android framework open, it was a boon for me. I picked up many unknown areas and started looking into code from a designer’s perspective. When i started understanding the Android framework code, I thought I was really able to join the dots.The dots between the dream and the reality to become an able software engineer…..

Comments and Discussions

 
QuestionEnglish dictionary Pin
Member 131186688-May-17 19:58
Member 131186688-May-17 19:58 
GeneralMy vote of 5 Pin
Member 115890157-Aug-15 20:01
Member 115890157-Aug-15 20:01 
GeneralCould be interesting but.. Pin
Neville Franks8-Apr-06 15:52
Neville Franks8-Apr-06 15:52 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.