Click here to Skip to main content
6,634,665 members and growing! (15,887 online)
Email Password   helpLost your password?
Languages » C / C++ Language » General     Intermediate License: The Code Project Open License (CPOL)

an efficient C++/STL library for word puzzles and spell checking

By Philippe Guglielmetti

DicoLib stores words in lists of anagrams indexed by their length and a 26 bits bitset which describe which letters are present in the words. This makes it extremely fast to search for words which contain specified letters, and to search for words which are "close" for spell checking appli
VC6, VC7, VC7.1, VC8.0, Windows, STL, VS.NET2003, Architect, Dev
Posted:14 Sep 2005
Views:17,450
Bookmarked:17 times
Unedited contribution
Announcements
Loading...
 
Search    
Advanced Search
Add to IE Search
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
6 votes for this article.
Popularity: 1.40 Rating: 1.80 out of 5
5 votes, 83.3%
1

2

3
1 vote, 16.7%
4

5

Introduction

This project started from a puzzle : There is a common English word that is nine letters long. Each time you remove a letter from it, it still remains an English word - from nine letters right down to a single letter. What is the original word, and what are the words that it becomes after removing one letter at a time? I wrote DicoLib to find the word, but since I stupidely translated "nine" by "8" in the code, I didn't find THE solution, but some others listed here DicoLib uses an original data structure that optimises the search of words which are "close" to a given one or contain the same letters. It is therefore suitable for spell checking, although this application hasn't (yet) been developed.

Data Structure

DicoLib's indexes words:

  • by their length
  • and by a bitset of 26 bits which describe which letters are present in the word

Actually, DicoLib stores a list of anagrams for each length/bitset pair. Finding anagrams of a given word therefore takes 0 time!

Finding words that are made of specified letters for Scrabble, or of a given length and contianing given letters is extremely fast as it requires mainly bitset operations and table lookup.

The whole data structure used in DicoLib is depicted in the schema below:

Source code

DicoLib is available on SourceForge You're welcome to contribute to the development of this project towards a very efficient spell-checker or any other application you may think of.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Philippe Guglielmetti


Member
Born 1963, Programs since 1979 (Commodore PET).
PhD, consultant and software architect of technical software, especially add-ins for mechanical engineereing CAD.
I like science in general, geometry, algorithms and C++
Occupation: Software Developer (Senior)
Location: Switzerland Switzerland

Other popular C / C++ Language articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 2 of 2 (Total in Forum: 2) (Refresh)FirstPrevNext
GeneralWould have been valuable if a little more care were shown. PinmemberWREY9:49 16 Sep '05  
GeneralBroken link PinmemberTheGlynn1:15 14 Sep '05  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 14 Sep 2005
Editor:
Copyright 2005 by Philippe Guglielmetti
Everything else Copyright © CodeProject, 1999-2009
Web18 | Advertise on the Code Project