Click here to Skip to main content
11,501,968 members (63,796 online)
Click here to Skip to main content

A Data Structure to Store Dictionaries

, 27 Apr 2009 CPOL 25.3K 315 8
Rate this:
Please Sign up or sign in to vote.
A data structure to store dictionaries in memory space-efficiently

Introduction

Excerpt from The Algorithm Design Manual by Steven S. Skiena: The abstract data type "dictionary" is one of the most important structures in Computer Science. Dozens of different data structures have been proposed for implementing dictionaries including hash tables, skip lists, and balanced/unbalanced binary search trees -- so choosing the right one can be tricky. Depending on the application, it is also a decision that can significantly impact performance. In practice, it is more important to avoid using a bad data structure than to identify the single best option available.

This article explains a data-structure that can be used to store a dictionary (words and their definitions) in memory.

Background

A "Tree", which is a very elementary data structure, has been brought into application here.

Using the Code

The code consists of the CharacterNode class that stores a char value along with references to 26 other CharacterNodes. The notion behind choosing 26 as the number of references is that this code presently is restricted to only storing the 26 English alphabets.

As can be seen in the diagram, the root object of CharacterNode points towards 'a' and 'f'. The nodes corresponding to the characters 'a' and 'f' further point towards other letters, which further follow the same. The code snippet that describes the CharacterNode class is given:

class CharacterNode
{
    char c;
    CharacterNode next[]=new CharacterNode[26];
    CharacterNode()
    {
        c='a';
        for(int i=0;i<26;i++)
        {
            next[i]=null;
        }    
    }
    //..rest of the class definition
}

The root (see the diagram) always holds the character 'a' (not shown in the diagram). This is because it is the first object formed (as shown below), and any object will always hold 'a' as the initial character (see the constructor above).

public class CharacterNodeTree
{
    public static void main(String args[]) throws IOException
    {
        CharacterNode root=new CharacterNode();
        //..rest of the code  

The insert() routine describes how a character will be stored in the dictionary.

void insert(char ch)
{
    this.c=ch;
}

The insert() routine from main() is called in the following manner:

CharacterNode temp=null;
String str=null;

System.out.print("Enter the string to insert:");
BufferedReader br1=new BufferedReader(new InputStreamReader(System.in));
str=br1.readLine();
root.next[str.charAt(0)-97]=new CharacterNode();
temp=root.next[str.charAt(0)-97];
for(int i=0;i<str.length();i++)
{
    temp.insert(str.charAt(i),i);
    try
    {
        temp.next[str.charAt(i+1)-97]=new CharacterNode();
        temp=temp.next[str.charAt(i+1)-97];
    }
    catch(StringIndexOutOfBoundsException e) {}
}

Hence, a simple calculation allows the insertion of an alphabet from the string. This calculation is nothing but the position of the alphabet in the alphabetical sequence (like 0 for 'a', 1 for 'b', 25 for 'z', etc.). So forth, the other alphabets from the string can be inserted, and therefore the whole of the string can be inserted.

The find() routine describes how a complete string will be retrieved from the dictionary.

boolean find(String str)
{
    CharacterNode temp=this;
    for(int i=0;i<str.length();i++)
    {
        if(temp.next[str.charAt(i)-97]==null)
        {
            return false;
        }
        else
        {
            temp=temp.next[str.charAt(i)-97];
            continue;
        }
    }
    return true;
}

The find() routine from main() is called in the following manner:

System.out.print("Enter the string to find:");
BufferedReader br2=new BufferedReader(new InputStreamReader(System.in));
str=br2.readLine();
if(root.find(str)==true)
{
    System.out.println("FOUND");
}

Again, a simple calculation allows the retrieval of a string (or even a part of it).

Points of Interest

Looking at the diagram, the code can be further modified:

  1. A boolean value can be added to the CharacterNode class definition to indicate whether a particular character ends with a meaningful word or not. For example, in the diagram, all the leaves end (and must end due to the structure of the code) with a meaningful word ('aleak', 'alert', 'all', 'food', 'foot' are the words that have been shown added in the diagram). However, 'ale' is a word that does not end with a leaf but is still a meaningful word and is shown added. Therefore, the leaves ('k', 't', 'l', 'd', 't') and the non-leaf node ('e') must have boolean values as true.
  2. It is easier to find a word with special properties, like starting with a particular character(s) or ending with a particular character(s) (think about links that directly refer leaves!), or having a fixed length (since all the words with the same length occur at the same level, and it should not be difficult to code a link between all these words!).
  3. A recursive routine (using a stack preferably) may provide a way to retrieve all the words lexicographically.

History

  • 25th April, 2009: Initial post
  • 26th April, 2009: Uploaded source code

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

neeraj399
Other
India India
No Biography provided

Comments and Discussions

 
QuestionInformation Pin
Member 807643411-Jul-11 18:37
memberMember 807643411-Jul-11 18:37 
GeneralMy vote of 1 Pin
Matt Weber27-Apr-09 5:06
memberMatt Weber27-Apr-09 5:06 
GeneralMy vote of 1 Pin
Jean-Paul Mikkers26-Apr-09 12:09
memberJean-Paul Mikkers26-Apr-09 12:09 
GeneralRe: My vote of 1 Pin
neeraj39927-Apr-09 3:20
memberneeraj39927-Apr-09 3:20 
QuestionWhy reinvent the wheel? Pin
zlezj26-Apr-09 0:54
memberzlezj26-Apr-09 0:54 
AnswerRe: Why reinvent the wheel? Pin
neeraj39926-Apr-09 1:53
memberneeraj39926-Apr-09 1:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.150520.1 | Last Updated 27 Apr 2009
Article Copyright 2009 by neeraj399
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid