English Dictionary






4.11/5 (11 votes)
The implementation of an English Dictionary using Ternary Search Trees
Introduction
In this article, I have tried to implement an English Dictionary application using a Ternary Search Tree through a MFC dialog based application, which has an input field and a list of words. It does prefix matching and thus filters out the unmatched words from the list as we type in the input text field. It also does the neighbor search of a particular word, i.e. it gives out the list of near words which closely match the word we have typed in.
For example, in the dictionary application we have a word say “bat”. Now if we type in “bat” and click on the button “More Words”, it will give a list of words like “bat”, “mad”, “mat”, “rat”, “sad” and “sat”, etc. It is understandable that these words should present in the tree structure.
We have another output field called “Meaning”, which will show the meaning of a word typed in the input box.
Explanation of the Code
There are two main classes in the application which have implemented the Dictionary
application. These two classes are CTernarySearchTree
and CTSTNode
.
The Dialog
class actually owns the ternary search tree class which in turn uses the Node
class.
Let’s delve into the Dialog
class. It has got few main functions like OnButton1
, OnChangeEdit1
, OnButton2
and OnButton3
. Actually when this project was initially implemented, I kept a button (Button1
) and on clicking that button I loaded the tree with data. But now I have hidden that button and instead call that function (OnButton1
) inside OnInitDialog
. The function OnButton2
is responsible for displaying the meaning. OnChangeEdit1
is responsible for the prefix matching as we type in. And OnButton3
is the function which shows the words after doing near search.
This is all about the functionality explanation of the Dictionary
application’s front end. The main logic of this application lies into the classes CTSTNode
and CTernarySearchTree
. Let’s discuss these two classes.
CTSTNode
is the class which represents each node in the tree structure. As the tree is a ternary search tree, each of the nodes of the tree has got three subtrees. These are referred to as LOKID
, EQKID
and HIKID
. At the same time, it has got a reference to the original string (which is to be loaded from a text file) as well as its meaning which is also loaded from a text file. The CTSTNode
has another character variable called cSplitChar
.
class CTSTNode
{
public:
friend class CTernarySearchTree;
friend class CEnglishDictionaryDlg;
CTSTNode();
CTSTNode(CTSTNode* p, char* SplitChar)
{
cSplitChar = *SplitChar;
EQKID = p;
originalstring = NULL;
meaning = NULL;
};
virtual ~CTSTNode();
private:
//enum ID {PARENT=0, LOKID, EQKID, HIKID};
char cSplitChar;
CTSTNode *LOKID, *HIKID, *EQKID, *PARENT;
char* originalstring;
char* meaning;
};
While inserting the data into the tree structure, the logic takes one character (say SplitChar
) from the string
(which is to be loaded) and compares it with the current node’s cSplitChar
. If alphabetically SplitChar
comes before cSplitChar
of the current node, the logic will place it in the LOKID
node of the current node. If the SplitChar
comes after cSplitChar
of the current node, the logic will place it in the HIKID
node of the current node. And if the SplitChar
is equal to the cSplitChar
of the current node, the logic will place it in the EQKID
of the current node. And it will repeat the whole process this way. This logic can be seen in the “Insert
” function of the CTernarySearchTree
class which is given below:
if (*SplitChar != '\0')
{
no_of_recursion++;
if(nodeptr == NULL)
{
nodeptr = new CTSTNode(nodeptr, SplitChar);
nodeptr->LOKID = nodeptr->HIKID = nodeptr->EQKID = NULL;
}
if(*SplitChar < nodeptr->cSplitChar)
{
nodeptr->LOKID = Insert(nodeptr->LOKID,SplitChar, meaning);
}
else if (*SplitChar == nodeptr->cSplitChar)
{
nodeptr->EQKID = Insert(nodeptr->EQKID, ++SplitChar, meaning);
}
else
{
nodeptr->HIKID = Insert(nodeptr->HIKID, SplitChar, meaning);
}
}
If we study the Insert
function a little more thoroughly, we will be able to understand that once the end of an word (which is being inserted in the tree) is reached , i.e. the ‘\0
’ char is reached, (i.e. when the variable lastnodeinitialized
becomes true
), it will store two references, one for the word itself, and the other for its meaning, inside that node. This can be seen from the code below:
if (*SplitChar == '\0' && no_of_recursion)
{
lastnodeinitialized = TRUE;
no_of_recursion--;
}
if(lastnodeinitialized && nodeptr)
{
nodeptr->originalstring = originalstring;
nodeptr->meaning = meaning;
lastnodeinitialized = FALSE;
no_of_recursion = 0;
originalstring = NULL;
}
The CTernarySearchTree
class has other member functions like NearSerch
, Partialmatch
, Search
, Traverse
and Traverse_And_Match
.
Of these, the function NearSearch
does a neighbour search of a particular string
within a certain Hamming distance. We can do it by typing the word “Bat” and by clicking “More Words” button. In the application, we are doing near search within distance 2 as is obvious from the following line of code:
void CEnglishDictionaryDlg::OnButton3()
{
……
test->NearSearch(root,str.GetBuffer(str.GetLength()),2);
……
}
This same function can be used for spell checking.
The function Traverse
traverses the whole tree and fills the main list box which shows all the words.
void CTernarySearchTree::Traverse(CTSTNode* nodeptr)
{
if (!nodeptr) return;
Traverse(nodeptr->LOKID);
if (nodeptr->cSplitChar)
{
Traverse(nodeptr->EQKID);
}
if(nodeptr->originalstring)
{
//AfxMessageBox(nodeptr->originalstring);
strList.AddHead(CString(nodeptr->originalstring));
}
Traverse(nodeptr->HIKID);
}
PartialMatch
is the function which is responsible for the filling up of the partially matched words as we type in.
void CTernarySearchTree :: PartialMatch(CTSTNode* nodeptr, char* String)
{
CTSTNode* Found_At = Search(nodeptr , String);
CTSTNode* currentnode = Found_At;
if(!Found_At) return;
Traverse_And_Match(Found_At,String);
}
Conclusion
This kind of application can be used for developing any dictionary application for mobile phones. The nearsearch
algorithm can be used for spell checking. The partialmatch
functionality can be used for developing a phone book in a mobile device.
Reference
- The article “Ternary Search Trees” by Jon Bentley and Bob Sedgewick that appeared in Dr. Dobb’s Journal.
History
- 8th April, 2006: Initial post