Click here to Skip to main content
Click here to Skip to main content

an efficient C++/STL library for word puzzles and spell checking

, 14 Sep 2005 CPOL
Rate this:
Please Sign up or sign in to vote.
DicoLib stores words in lists of anagrams indexed by their length and a 26 bits bitset which describe which letters are present in the words. This makes it extremely fast to search for words which contain specified letters, and to search for words which are "close" for spell checking appli

Introduction

This project started from a puzzle : There is a common English word that is nine letters long. Each time you remove a letter from it, it still remains an English word - from nine letters right down to a single letter. What is the original word, and what are the words that it becomes after removing one letter at a time? I wrote DicoLib to find the word, but since I stupidely translated "nine" by "8" in the code, I didn't find THE solution, but some others listed here DicoLib uses an original data structure that optimises the search of words which are "close" to a given one or contain the same letters. It is therefore suitable for spell checking, although this application hasn't (yet) been developed.

Data Structure

DicoLib's indexes words:

  • by their length
  • and by a bitset of 26 bits which describe which letters are present in the word

Actually, DicoLib stores a list of anagrams for each length/bitset pair. Finding anagrams of a given word therefore takes 0 time!

Finding words that are made of specified letters for Scrabble, or of a given length and contianing given letters is extremely fast as it requires mainly bitset operations and table lookup.

The whole data structure used in DicoLib is depicted in the schema below:

Source code

DicoLib is available on SourceForge You're welcome to contribute to the development of this project towards a very efficient spell-checker or any other application you may think of.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Dr. Goulu
Technical Lead
Switzerland Switzerland
Born 1963, Programs since 1979 (Commodore PET).
PhD, consultant and software architect of technical software, especially add-ins for mechanical engineereing CAD.
I like science in general, geometry, algorithms and Python
Follow on   Twitter   Google+   LinkedIn

Comments and Discussions

 
GeneralWould have been valuable if a little more care were shown. PinmemberWREY16-Sep-05 9:49 
GeneralBroken link PinmemberTheGlynn14-Sep-05 1:15 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.141216.1 | Last Updated 14 Sep 2005
Article Copyright 2005 by Dr. Goulu
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid