A Spell Checking Engine

Matt Gullett

Rate me:

4.88/5 (16 votes)

5 Feb 2001

274.5K

108

A free spell checking engine for use in your C++ applications. Includes the current US English dictionary

Download demo project - 85 Kb

Download the current US English Dictionary - 820 Kb

Description:

This project is an evolution of the spell checking engine project I submitted earlier. This project includes numerous enhancements to the core spelling engine plus the addition of a "check-as-you-type" edit control and the related support dialogs (see above).

This project is not complete, it is a work-in-progress. There are numerous issues with the current version which need to be addressed. My long term goal is to develop this to "commercial quality".

I am going to continue to improve this engine toward my goal. I will continue to post updates as I feel necessary.

Changes from previous version:

Reorganized class architecture. Added CFPSSpellCheckEngineOptions, CFPSDictionary to core engine.
Created CFPSSpellingEditCtrl CEdit derived class to implement "check-as-you-type" edit control.
Created options property pages (see above)
Created spelling dialog (see above)
Created common-use dictionary. (Download available above)
Updated US English dictionary w/improved word list + proper names.
Incremental changes to the MetaphoneEx function.
Addition of the EditDistance function.
Added support for case-sensitive dictionary entries.
Added file header to dictionaries.

To-do-list:

Re-write EditDistance algorithm for performance.
Research compression options for dictionaries
Create ATL ActiveX control from edit control
Create COM based dictionary support for language independence
Create Rich Edit "check-as-you-type" control.
Add auto-correct capability.
Add sentence begin recognition for automatic upper case decisions.
Continue to improve US English dictionary.
Implement a binary-search mechanism on dictionary look-ups.
Continue to improve MetaphoneEx function.
Create C# (.net) version (when C# stabilizes)
ETC...

Classes:

`CFPSSpellCheckEngine`	This is the core spelling engine. It is intended to be language independent. (not currently). This engine encapsulates the functionality of managing dictionaries, making suggestions (through dictionaries), and maintaining spelling options.
`CFPSSpellCheckEngineOptions`	Support class for `CFPSSpellCheckEngine` which implements support for storing, saving and loading spell checking options. Currently, this uses a serialized file to store options, but could easily be changed to INI file or registry.
`CFPSDictionary`	Base dictionary class. Defines a set of virtual functions generic to all dictionaries. Also, provides base implementation of all virtual functions based on current requirements. This class uses a defined file structure w/ a file header and any number of dictionary records. Future derivations of this class will provide language specific support.
`CDlgSpellChecker`	`CDialog` derived class which implements the spell checker dialog. Currently the undo support is based on edit control undo support, not spell checker undo. Need to improve this further.
`CPrShtSpellOptions`	`CPropertySheet` derived class which implements the spell checking engines property sheet.
`CPrPgeSpellOptions_General`	`CPropertyPage` derived class implementing general options panel.
`CPrPgeSpellOptions_User`	`CPropertyPage` derived class implementing user dictionary options panel.
`CPrPgeSpellOptions_Common`	`CPropertyPage` derived class implementing common misspellings options panel.

Support functions of importance:

`void CheckSpellingEdit (CFPSSpellCheckEngine* pEngine, CEdit* pEdit)`	This function is called when a user presses the F7 (or configured) hot key from within the "check-as-you-type" edit control. It displays the CDlgSpellChecker dialog box.
`void CheckSpellingRich (CFPSSpellCheckEngine* pEngine, CRichEditCtrl* pEdit)`	This function is called when a user presses the F7 (or configured) hot key from within the "check-as-you-type" edit control. It displays the CDlgSpellChecker dialog box. NOTE: This function is not currently being used because the rich edit control is not complete.
`int EditDistance(const char szWord1, const char szWord2)`	This function is passed in 2 words and returns an approximation of the minimum number of changes a user would need to make to make the 2 words match. This function is not a true edit-distance algorithm, but is a customized algorithm for this spell checking application.
void MetaphoneEx(const char szInput, char szOutput, int iMaxLen)	This function is passed a word and it returns (through the szOutput parameter) a modified-metaphone representation of the word. This is a variation on the algorithm originally wrote by <a href="http://www.cuj.com/archive/1806/feature.html">Lawrence Philips.</a> A newer version of his algorithm (double-metaphone) is also available. I have tested this algorithm with the spell checking engine and was not impressed with the results. It does provide fast results and a high hit-rate, but it also returns far too many results (on average). However, I am considering using it in conjunction with the `EditDistance` algorithm and will further review this.
`void SortMatches(LPCSTR lpszBadWord, CStringList &Matches)`	This function sorts a list of word suggestions based on the approximate edit-distance between the words in the list and the misspelled word based in as lpszBadWord.

Architecture:

CORE ENGINE
The core spell checking engine consists of the three classes: CFPSSpellCheckEngine,
CFPSSpellCheckEngineOptions
and CFPSDictionary. These classes provide support for dictionary related functions such as add a word, remove a word, ignore a word, load dictionary, save dictionary, Is a word in the dictionary, suggest possible matches, etc.

The core engine is implemented as a strict back-end engine. It has no user-interface components. Most of the functions exposed by these classes where an error might occur return an int return code. These return codes are defined in 1) FPSSpellCheckerInclude.h and 2) the header file for a given class. The return codes should always be examined to determine the completion status of these functions.

Special care has been taken to insure that these classes are very stable and robust. Also, performance considerations weigh heavy on the implementation of these classes. Very little MFC code is used in these classes and functions.

CHECK-AS-YOU-TYPE EDIT CONTROL
The check-as-you-type edit control is contained in the CFPSSpellingEditCtrl class. It is derived off of CEdit and works by subclassing an existing edit control through the AttachEdit function.

To improve performance, this control implements a timer and whenever there is no user activity (typing, mouse clicking, scrolling, etc) checks the spelling of the displayed portion of the edit control. The function RedrawSpellingErrors is called to perform the checking. It checks only the displayed portion of the edit control and calls DrawSpellingError for each displayed word. If a word is not found in the dictionary, this function calls DrawSquiglyI to draw the squigly underline for the word. DrawSquigly creates a structure of type FPSSPELLEDIT_ERRORS and adds it to the m_SpellingErrors member list.

The OnRButtonDown function checks the m_SpellingErrors to determine when to display the normal popup menu and when to display the spell check popup menu. Suggestions returned from the core engine are sorted using the SortMatches function to display them in order of edit-distance.

The PreTranslateMessage checks for a hot key (defaults to F7). This can be customized by calling the SetHotKey static member function. When the hot key is pressed the CheckSpellingEdit function is called to display the spell checking dialog box.

SPELL CHECK DIALOG BOX

The spell checking dialog box is implemented in the CDlgSpellChecker class. This is a standard CDialog derived class based on the IDD_SPELL_CHECK dialog resource.

The spell checking dialog is modelled after the Microsoft Word implementation of spell checking. It is laid out the same and functions (for the most part) the same. This dialog searches an edit control (or rich edit control) for sentences misspelled words and displays the sentence with the misspelled word highlighted.

Suggestions returned from the core engine are sorted using the SortMatches function to display them in order of edit-distance.

How to use the demo:

Unzip the provided file into a directory (be sure to extract the sub directories.)
Make sure that the USMain.dic file is in the \Release directory.
Make sure that the USCommon.dic file is in the \Release directory.
Execute the FPSSpellChecker.exe from the \Release directory.

How to incorporate the spell checker into an application:

In your applications InitInstance function, add a call to CFPSSpellingEditCtrl::InitSpellingEngine(NULL) static member function; OR, instead of NULL, pass in a string containing a fully qualified path to a spell checking engine options file.
In your applications ExitInstance function, add a call to CFPSSpellingEditCtrl::Terminate static member function

Add the following files to your project.

DlgSpellChecker.cpp	DlgSpellChecker.h
DlgSpellingEditCtrl.cpp	DlgSpellingEditCtrl.h
FPSDictionary.cpp	FPSDictionary.h
FPSSpellCheckEngine.cpp	FPSSpellCheckEngine.h
FPSSpellCheckEngineOptions.cpp	FPSSpellCheckEngineOptions.h
FPSSpellCheckerInclude.cpp	FPSSpellCheckerInclude.h
FPSSpellingEditCtrl.cpp	FPSSpellingEditCtrl.h
PrPgeSpellOptions_Common.cpp	PrPgeSpellOptions_Common.h
PrPgeSpellOptions_General.cpp	PrPgeSpellOptions_General.h
PrPgeSpellOptions_User.cpp	PrPgeSpellOptions_User.h
PrShtSpellOptions.cpp	PrShtSpellOptions.h

Copy the following resource items to your project.

IDD_SPELL_CHECK
IDD_SPELL_OPTION_COMMON
IDD_SPELL_OPTION_GENERAL
IDD_SPELL_OPTION_USER

Include the "FPSSpellCheckerInclude.h" file in your stdafx.h file.
#include "FPSSpellCheckerInclude.h"
Place a standard edit control on a form or dialog resource and give it a unique control id (ie. ID_TEST_EDIT)
Add a member variable of type CFPSSpellingEditCtrl to the dialog/form class file (ie. m_editTest)
In the OnInitDialog function, call the
```
AttachEdit
```
member function of CFPSSpellingEditCtrl (ie.
```
m_editTest.AttachEdit(this,
    ID_TEST_EDIT);
```

Known Issues

Performance is still not as good as it needs to be.
Language support is limited to US English.
The EditDistance function needs work.
The MetaphoneEx function needs work.
There is a painting problem with the edit control when scrolling the control while the spelling error "squigly" lines are displayed.
No complete support for rich edit control.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Matt Gullett

Web Developer

United States

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.