
Please comment your votes!
Introduction
This article presents a ternary search tree dictionary (a symbol table) that is generally as fast as the Hashtable
but also features new search possibilities. This new symbol table can be used in replacement of the other .Net dictionaries when the keys are strings. The main class behaves almost like any IDictionary
table, to the difference that it accepts string key only.
This data structure supports also interesting search methods such as partial match and near-neighbor match. For instance, the near-neighbor search can be used to provide similar words while implementing an online thesaurus.
The ternary search tree dictionary implementation is based on the article of Jon L. Bentley and Robert Sedgewick, Fast Algorithms for Sorting and Searching Strings, cfr. [1].
The class comes with a full set of NUnit test, benchmarks with Hashtable
and some utility to visualize the tree.
What is a ternary search tree ?
The ternary search tree (TST) is a dictionary data structure specially crafted for the situation where the key are string. It can find, remove and add keys quickly. It can also easily search the tree for partial matches -- auto matches autograph, automatic, and so on. In addition, you can implement near-match functions, which gives you the ability to suggest alternatives for misspelled words -- e.g., impliment matches implement.
A TST stores key-value pairs, where keys are strings and values are objects. In addition, TSTs use memory efficiently to store large quantities of data. Best of all, all of this magic is performed lightning fast (see benchmark). The tremendous flexibility of TSTs provides ample opportunity for creative programming.
The picture in the article front shows how the ternary tree is layout for 12 two letters words: as at be by he in is it of on or to.
How to use it ?
The TstDictionary
implements the dictionary. It can be used as any other IDictionary
based class, the only restriction being that key must be strings.
TstDictionary tst = new TstDictionary();
Add entries
tst.Add("hello", null);
Object value = ...;
tst.Add("another key",value);
Remove entries
tst.Remove("hello");
Clear dictionary
tst.Clear();
Iterate over entries
foreach(DictionaryEntry de in tst)
{
}
Access elements
tst["hello"]=null;
Object value = tst["hello"].Value;
Partial Match search
Partial match search with wild char character. Searching the dictionary for the pattern *o*o*o matches the single word rococo, while the pattern *a*a*a matches many words, including banana, casaba, and pajama:
foreach(DictionaryEntry de in tst.PartialMatch("*a*a"))
{
}
Near-Neighbor search
This methods finds all words in the dictionary that are within a given Hamming distance of a query word. For instance, a search for all words within distance two of soda finds code, coma and 117 other words.
foreach(DictionaryEntry de in tst.NearNeighbors("a",1))
{
}
Why not implementing IDictionary ?
As mentionned in the introduction, the TstDictionary
does not implement the IDictionary
interface, although it implements all the methods needed. The rationale for this is to avoid confusing the user by letting him use non-string keys.
Threading
The TstDictionary
class provides a Synchronized
method to obtain a synchronized (thread-safe) wrapper of the dictionary. This wrapper behaves similarly to other .NET synchronized wrappers.
Benchmarks


Projects
The Tst project is composed of the following projects:
Tst
, a class library that contains the TST implementation,
Tst.Utility
, a set of helper classes to visualize the TST modifications,
Tst.Tests
, NUnit tests
Tst.Benchmark
, benchmarks with Hashtable
,
Tst.Console
, demo application.
Todo list
- The enumerator should iterate the keys in alphabetical order, which it does not... yet.
History
- 11-01-2004, initial release.
References