Click here to Skip to main content
15,879,326 members
Articles / Desktop Programming / WTL

A WTL Hunspell-checked Edit Control

Rate me:
Please Sign up or sign in to vote.
5.00/5 (9 votes)
21 Jun 2009CPOL12 min read 41.7K   2.9K   30   7
A WTL Hunspell-checked edit control.

Introduction

I have an application that is used for setting land values in New South Wales, Australia. The various people that use the program are required to enter notes and justifications for their values, but for various reasons, spelling mistakes creep into their notes. It's probably due to gremlins fiddling with the database overnight, because none of them actually make mistakes themselves. In order to help them find the errors, they requested that I include a spell checker with the program.

Background

My application is written using WTL. I'm actually becoming a bit jaded with this ... there is quite a bit of source code now, and it's getting to the point where it's taking a very long time to build. sloccount[^] says that there are about 110,000 lines of C++ code, and at least 1/2 of it has to be rebuilt and re-linked when anything of significance changes. Painful. Just out of curiosity, I checked the preprocessor output of the main source code file: there are around 50,000 lines of non-blank, non-comment code after the inclusion of stdafx.h, and something over 400,000 including the stdafx.h code. I'm not trying to brag about the size of the project here (it's just not that big); I'm saying that if you're considering WTL, and the project is going to be big, think long and hard about it. Build times get out of hand.

Anyway, I went in search of a spell-checker. I've been using VSSPELL version 6 for simple behind-the-scenes spell checking for a different client, but that's getting a bit long in the tooth. Also, although I bought the thing quite a few years back, it appears to be popping up a nag screen when I want to use the GUI component. I really wanted to use the GUI component so that it could hook into my edit windows and do the red underlining for me. So something else was required.

I came to The Code Project first, and found Matt Gullett's Spell Checking Engine. This was written for MFC, so wasn't immediately useful to me. I also wanted an Australian dictionary. The hunt continued. I did regular net searches and found aspell, Hunspell, and several other commercial offerings. Couldn't find Matt Gullett's www.spellican.com. Eventually, I settled on Hunspell. It's Open Source, people seemed happy with it, there is a nice MSVC project available for the library, and the Australian English dictionaries are available. And, well, gosh ... if it's good enough for OpenOffice, it's good enough for me.

So I downloaded it, built it, and then had to try and figure out how to use it. The API is not well documented ... or maybe I just couldn't find it. I tracked down NHunspell[^], and that turned out to be useful in figuring out how to use the API. So all was good.

At that point, I needed to incorporate the checker into the edit window. Out came Matt Gullett's code, and I unashamedly stole his edit control code, and ported it into the WTL environment.

Using the Code

There are three main items that I want to address: using the wrapper that I built for the Hunspell code, using the CSpellCheckEdit class, and the anatomy of the CSpellCheckEdit class.

Using the Singleton SpellCheck Wrapper Class

I've been using STL for my strings and collections throughout my application, so I've continued with the same convention here. It's a good fit with the WTL stuff. Having said that, I've provided both const char* and const std::string& versions of methods where it's reasonable. Feel free to add your own const CString& methods as well.

Please note that the version of Hunspell that I downloaded doesn't come with Unicode methods. If you're writing code in a Unicode environment, you should have a look at NHunspell[^], as it contains all of the wide-to-multibyte string conversion code that you will need.

Initialisation

The SpellCheck wrapper class is a singleton class. Why? Because, starting up the Hunspell checker is expensive, and I only want to do it once. To get things under way, you just need to get a reference to the singleton and then tell it where your dictionaries are.

C++
SpellCheck& sc = SpellCheckS::instance();
sc.loadDicts("en_AU.aff", "en_AU.dic", "custom.dic");

When you invoke the loadDicts() method, the SpellCheck object starts a thread to create the actual Hunspell object and load the dictionaries. It also reads the words (one word per line) from the "custom.dic" file and adds them to the dictionary.

Checking Words

To check a word, invoke the singleton's spell() method. This method will return true (indicating that the spelling is valid) if the dictionaries are not available for checking yet or if the dictionaries have the word as valid; or false if the word is determined to be incorrect by the spelling engine.

C#
SpellCheck& sc = SpellCheckS::instance();
if (sc.wordIsOK(lpszWord))
{
    // word is OK, don't need to check any further
    return;
}

I indicated above that the dictionaries might not be available for checking words at the time you want to check. The reason is that the dictionaries are loaded by a separate thread. I found that the dictionaries didn't load quite as quickly as I would like, particularly in debug mode. My users are used to the login screen coming up immediately, and I didn't want to have to delay the appearance of this window, or introduce a splash screen that loaded stuff in the background.

So, I have a _ready flag in the object. This is set to true when the dictionaries have been loaded. Until this has been done, every word checked will be shown to be correct. I can't have a bunch of red ink all over the screen just because the spell check is slow loading. When it has loaded successfully, everything will seamlessly switch over to checking as per expectation.

All of the code that interacts with the actual or custom dictionaries are protected by a critical section.

Getting Suggestions

Obviously, users expect a bit more than just a red flag to let them know that they've misspelled a word. Hunspell does contain a suggest() method, so you can use this to get a list of suggestions for the user.

C++
SpellCheck& sc = SpellCheckS::instance();
STRINGLIST options;

sc.suggest(lpszWord, options);
for (STRINGLIST::iterator it = options.begin(); it != options.end(); it++)
{
    ATLTRACE("Suggestion: %s\n", it->c_str());
}

The nicest thing to do with such a list is to put it in a context menu so that the user can right-click an error message and simply replace the misspelled word with the correct one.

Adding Words

When I first started working with the Hunspell library, I wondered how to make it so that users could add words to the dictionary. Being property valuers, my users have their own collection of jargon and abbreviations that aren't necessarily represented in the common dictionary. I'd not really thought about it before, but the dictionaries provided with the library are essentially read-only. That's all well and good, but what about custom dictionaries? Hunspell has an add() method that allows you to add a word to the dictionary, but that is only for the duration of the Hunspell object. It doesn't propagate to the dictionary itself. I didn't really know what to do at that point.

Then, I experienced a D'oh! moment, and slapped my forehead. OK, when the user adds a word to the dictionary, I'll also write the word to their "custom.dic" file. When I load the dictionaries at launch time, I'll read that file and just add the words before making the spell checker available to the rest of the application. Right. Done.

C++
SpellCheck& sc = SpellCheckS::instance();
sc.add(lpszWord);

Easy.

Cleaning Up

When the program has finished, you should close the singleton SpellCheck object.

C#
SpellCheckS::close();

This ensures that the Hunspell object is deleted, so you don't get a million lines of memory leaks when you're debugging. Ahem.

Using the CSpellCheckEdit Class

OK, so we have the spell checker. Assume for the moment that we already have the CSpellCheckEdit class available. How should this be used in a given WTL dialog box? We need to do a couple of things.

  1. Include the SpellCheckEdit.h file.
  2. Create a CSpellCheckEdit variable.
  3. Subclass an edit control on the dialog.
  4. Reflect notifications.

What does this look like in code?

C++
#include "SpellCheckEdit.h"  // (1) above
class CMainDlg : public CDialogImpl<CMainDlg>
{
    /* ... */
    
public:
    CSpellCheckEdit scEdit;  // (2) above
    
    /* ... */
    
    BEGIN_MSG_MAP(CMainDlg)
        MESSAGE_HANDLER(WM_INITDIALOG, OnInitDialog)
        /* ... */
        REFLECT_NOTIFICATIONS()  // (4) above
    END_MSG_MAP()
    
    /* ... */
    
public:
    LRESULT OnInitDialog(UINT /*uMsg*/, WPARAM /*wParam*/, 
            LPARAM /*lParam*/, BOOL& /*bHandled*/)
    {
        /* ... */
        scEdit.SubclassWindow(GetDlgItem(IDC_EDIT));  // (3) above
        /* ... */
        return TRUE;
    }

Anatomy of the CSpellCheckEdit Class

Admin Stuff

The file starts with an enum that contains the IDs of the commands that will be returned from the call to TrackPopupMenu discussed below. Basically, the deal is that suggestions are added to the context menu, and these values are associated with them.

C++
enum
{
    ID_SPELLCHECK_OPT0=0x8000,
    ID_SPELLCHECK_OPT1,
    ID_SPELLCHECK_OPT2,
    ID_SPELLCHECK_OPT3,
    ID_SPELLCHECK_OPT4,
    ID_SPELLCHECK_OPT5,
    ID_SPELLCHECK_OPT6,
    ID_SPELLCHECK_OPT7,
    ID_SPELLCHECK_OPT8,
    ID_SPELLCHECK_OPT9,
    ID_SPELLCHECK_ADD
};

The CSpellCheckEdit class itself is derived from CWindowImpl<CSpellCheckEdit, CEdit>, so you will create instances of this class rather than derive from it yourself.

The class has an internal struct (SpError) that represents errors that are found within the text. These have the rectangle that the spelling mistake lies within, the misspelled word, and the character position of the start of the word within the edit control. I've also included a typedef that lets me refer to an std::list of these as SPERRLIST.

C++
struct SpError
{
    CRect rcArea;
    CString word;
    int posn;
};
typedef std::list<SpError> SPERRLIST;

Finding and Drawing Errors

The methods involved in finding and drawing errors are:

  • RedrawErrors (two signatures: one called by event handlers, one called internally)
  • IsWordBreak
  • DrawError
  • DrawSquiggly
  • InvalidateCheck
RedrawErrors (1)

RedrawErrors (the one called by event handlers) clears the list of errors that had previously been found, and then loops through each visible line of text in the control. It invokes the internal RedrawErrors method for each line.

The original code from Matt Gullett's project uses the CEdit::LineLength call incorrectly. The original code assumed that you passed a line number to CEdit::LineLength to get the length of the line. This is not the case. You pass the character offset of a character in the line to get the length of the line containing that character. The upshot is that this:

C++
// FPSSpellingEditCtrl.cpp:
190: int iLine = GetFirstVisibleLine();
191: int iChar = LineIndex(iLine);
192: int iLineLen = LineLength("color: red;">iLine);

was changed to this:

C++
// SpellCheckEdit.h
91: int iLine = GetFirstVisibleLine();
92: int iChar = LineIndex(iLine);
93: int iLineLen = LineLength("color: red;">iChar);

There is another instance of this same problem being corrected from FPSSpellingEditCtrl.cpp (216, 217) to SpellCheckEdit.h (116, 117).

While the original code worked, it looks like it checked each line many times. Possibly as many times as there were characters in the line. I didn't verify that ... I just saw that things were being checked way more often than they should have been.

RedrawErrors (2)

The internally-called RedrawErrors method gets each word from the given line (using IsWordBreak to determine where words break), trims it, and passes it to DrawError.

DrawError

DrawError is the method that actually talks to the SpellCheck object. Given the word that needs to be checked, this method invokes SpellCheck::wordIsOK. If the word is OK, DrawError simply returns before going any further.

If the word is not in the dictionary, DrawError calculates the location and size of the word, and if the bottom of the calculated rectangle is within the bounds of the edit window, calls DrawSquiggly.

Finally, it creates an SpError object with the error's information and adds it to the control's SPERRLIST.

DrawSquiggly

This method (I renamed it from DrawSquigly to DrawSquiggly) simply draws the red dotted line under the misspelled word.

The "squiggly" was originally a jagged line, drawn by a series of oscillating LineTo calls. I thought I could do a bit better that that, and, using GDI+, drew an anti-aliased multipoint Bezier curve under the word. That looked pretty cool. Then, I saw the spell checker in Firefox, and thought that that looked better. So now, my code draws the single dotted line under the misspelled word.

For the sake of interest, I've left the GDI+ code intact, and you can enable it if you want to. To enable this code, #ifdef the code in DrawSquiggly, and the block starting at wtlspell.cpp (20). You will also have to link with gdiplus.lib.

Handling Events

SubclassWindow

While not an event, this method fires off the timer.

OnDestroy (WM_DESTROY)

This event handler kills the timer, and allows default processing to continue.

OnTimer (WM_TIMER)

This is my least favourite method. It is invoked when the timer fires. The first thing it does is kill the timer. Later on, it recreates it.

OnSetText (WM_SETTEXT)

I added this method to the original class because it caused the text of the window to be checked automatically when the text was set. For instance, by a DDX_TEXT macro.

OnChange (WM_COMMAND:EN_CHANGE)

When the program receives an EN_CHANGE message, it invalidates the current checked state and causes the visible text to be checked again.

OnPaint (WM_PAINT)

This handler causes the program to redraw the errors if any are known to exist.

OnScroll (WM_HSCROLL, WM_VSCROLL)

Because the program only checks and redraws text in the visible lines of the edit box, scrolling means that the visible area changes, so the check needs to be redone. The rectangles associated with the errors will also be changed, and this is why we need to handle the horizontal scroll messages.

OnLButton (WM_LBUTTONDOWN, WM_LBUTTONUP)

I actually don't know why I handle these messages. There must have been a reason.

OnKeyDown (WM_KEYDOWN)

Not every key press results in a change to the text (an EN_CHANGE message), so this message handles those cases where the errors should be redrawn despite the fact that the text is unchanged. Perhaps a selection is being extended using Shift+Arrow. The error squigglys need to be redrawn in this case.

OnContextMenu (WM_CONTEXTMENU)

This is the most interesting event handler (I think). This handler builds and shows the context menu for the spell checker. Here's what it does in overview:

  1. Get the point in the control that the left-click took place (client coordinates).
  2. If the click was not inside a misspelled word, allow the framework to handle the event in the default manner.
  3. Create a popup menu. Note that you don't create a menu, you create a popup menu. It took me quite a while to figure that out again. Sigh.
  4. Get a list of suggestions from the SpellCheck singleton.
  5. Add the first (up to) 10 suggestions to the menu.
  6. Add a separator, the "Add Word" item, and another separator.
  7. Add the normal Edit context menu items (Undo, Cut, Copy, Paste, Delete, Select All).
  8. Invoke the TrackPopupMenu method.
  9. Handle the user's selection.

Points of Interest

After I'd done all of this code, and settled down to write an article, I discovered Curtis J's Spell Checking Edit Control (Using HunSpell) article. Argh! I could have used his, and just ported his edit control to WTL! Ah well, there are another couple of points of difference between his code and mine ... I would strongly urge you to check his for "Ignore" functionality, dictionaries for different languages, a more comprehensive "user dictionary", and a lot more VERIFYs than I have.

History

  • 2009-06-22: v1.0 - Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Australia Australia
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionI want to create my own .aff and .dic files. Have you any idea how to create it Pin
ssamit12-Dec-11 20:10
ssamit12-Dec-11 20:10 
I want to do the spellchecker for Marathi Language as same as your creation but not in unicode Marathi. because it is available.
Actually I have Marathi feeding over ASCII code.( As just to used different types of fonts. ie. Marathi font is wrapped around the english alphabets.)

Now, I want to make spellchecker and auto correction utility for that. I have some questions regarding to that.


Can a simple english dictionary with that ASCII database correct the word in marathi.?
(When that database have equivalent database in the ASCII format)

If so....would you please guide me regarding that.

Have you english spellchecker complete source code in java or c# or any.

Have you any idea how to create .dic and .aff files.

Kindly reply...

Yours Faithfully,
Amit B. Sarode
AnswerRe: I want to create my own .aff and .dic files. Have you any idea how to create it Pin
_oti12-Dec-11 20:28
_oti12-Dec-11 20:28 
QuestionHunspell Library integration Pin
caharim24-Mar-11 5:47
caharim24-Mar-11 5:47 
AnswerRe: Hunspell Library integration Pin
_oti24-Mar-11 11:04
_oti24-Mar-11 11:04 
GeneralVb.Net version Pin
Anthony Daly21-Jun-09 23:12
Anthony Daly21-Jun-09 23:12 
GeneralRe: Vb.Net version Pin
_oti22-Jun-09 11:55
_oti22-Jun-09 11:55 
GeneralRe: Vb.Net version Pin
Anthony Daly23-Jun-09 4:01
Anthony Daly23-Jun-09 4:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.