Click here to Skip to main content
15,867,686 members
Articles / Web Development / ASP.NET
Article

Ternary Search Tree Dictionary in C#: Faster String Dictionary!

Rate me:
Please Sign up or sign in to vote.
4.57/5 (46 votes)
27 Jan 20043 min read 304.6K   4.9K   124   48
A TST is a fast and memory efficient data structure for implementing a string dictionary.

Sample Image - tst.png

Please comment your votes!

Introduction

This article presents a ternary search tree dictionary (a symbol table) that is generally as fast as the Hashtable but also features new search possibilities. This new symbol table can be used in replacement of the other .Net dictionaries when the keys are strings. The main class behaves almost like any IDictionary table, to the difference that it accepts string key only.

This data structure supports also interesting search methods such as partial match and near-neighbor match. For instance, the near-neighbor search can be used to provide similar words while implementing an online thesaurus.

The ternary search tree dictionary implementation is based on the article of Jon L. Bentley and Robert Sedgewick, Fast Algorithms for Sorting and Searching Strings, cfr. [1].

The class comes with a full set of NUnit test, benchmarks with Hashtable and some utility to visualize the tree.

What is a ternary search tree ?

The ternary search tree (TST) is a dictionary data structure specially crafted for the situation where the key are string. It can find, remove and add keys quickly. It can also easily search the tree for partial matches -- auto matches autograph, automatic, and so on. In addition, you can implement near-match functions, which gives you the ability to suggest alternatives for misspelled words -- e.g., impliment matches implement.

A TST stores key-value pairs, where keys are strings and values are objects. In addition, TSTs use memory efficiently to store large quantities of data. Best of all, all of this magic is performed lightning fast (see benchmark). The tremendous flexibility of TSTs provides ample opportunity for creative programming.

The picture in the article front shows how the ternary tree is layout for 12 two letters words: as at be by he in is it of on or to.

How to use it ?

The TstDictionary implements the dictionary. It can be used as any other IDictionary based class, the only restriction being that key must be strings.

TstDictionary tst = new TstDictionary();

Add entries

C#
tst.Add("hello", null);
// attach a value to the key
Object value = ...;
tst.Add("another key",value);

Remove entries

C#
tst.Remove("hello");

Clear dictionary

C#
tst.Clear();

Iterate over entries

C#
foreach(DictionaryEntry de in tst)
{
   // de.Key is the key,
   // de.Value is the value
}

Access elements

C#
tst["hello"]=null;
Object value = tst["hello"].Value;

Partial Match search

Partial match search with wild char character. Searching the dictionary for the pattern *o*o*o matches the single word rococo, while the pattern *a*a*a matches many words, including banana, casaba, and pajama:

C#
foreach(DictionaryEntry de in tst.PartialMatch("*a*a"))
{
   // de.Key matches the pattern
}

Near-Neighbor search

This methods finds all words in the dictionary that are within a given Hamming distance of a query word. For instance, a search for all words within distance two of soda finds code, coma and 117 other words.

C#
foreach(DictionaryEntry de in tst.NearNeighbors("a",1))
{
   // de.Key matches the pattern as, at, ma, ba, etc...
}

Why not implementing IDictionary ?

As mentionned in the introduction, the TstDictionary does not implement the IDictionary interface, although it implements all the methods needed. The rationale for this is to avoid confusing the user by letting him use non-string keys.

Threading

The TstDictionary class provides a Synchronized method to obtain a synchronized (thread-safe) wrapper of the dictionary. This wrapper behaves similarly to other .NET synchronized wrappers.

Benchmarks

Addition benchmarks

Contains benchmarks

Projects

The Tst project is composed of the following projects:

  • Tst, a class library that contains the TST implementation,
  • Tst.Utility, a set of helper classes to visualize the TST modifications,
  • Tst.Tests, NUnit tests
  • Tst.Benchmark, benchmarks with Hashtable,
  • Tst.Console, demo application.

Todo list

  • The enumerator should iterate the keys in alphabetical order, which it does not... yet.

History

  • 11-01-2004, initial release.

References

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Engineer
United States United States
Jonathan de Halleux is Civil Engineer in Applied Mathematics. He finished his PhD in 2004 in the rainy country of Belgium. After 2 years in the Common Language Runtime (i.e. .net), he is now working at Microsoft Research on Pex (http://research.microsoft.com/pex).

Comments and Discussions

 
GeneralNearest Neighbors not 100% accurate Pin
MaryJones11-Jan-06 13:04
MaryJones11-Jan-06 13:04 
GeneralRe: Nearest Neighbors not 100% accurate Pin
Tomislav Kosutic26-Jan-06 23:15
Tomislav Kosutic26-Jan-06 23:15 
GeneralRe: Nearest Neighbors not 100% accurate Pin
Jesus Navarro27-Mar-08 6:46
Jesus Navarro27-Mar-08 6:46 
GeneralRe: Nearest Neighbors not 100% accurate Pin
kchvilyov27-Apr-08 3:28
kchvilyov27-Apr-08 3:28 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.