Click here to Skip to main content
Email Password   helpLost your password?

NetSpell

Introduction

The NetSpell project is a spell checking engine written entirely in managed C# .NET code.  NetSpell's suggestions for a misspelled word are generated using phonetic (sounds like) matching and ranked by a typographical (looks like) score.  NetSpell supports multiple languages and the dictionaries are based on the OpenOffice Affix compression format. The library can be used in Windows or Web Form projects. The download includes an English dictionary with dictionaries for other languages available for download on the project web site. NetSpell also supports user added words and automatic creation of user dictionaries. It also includes a dictionary build tool to build custom dictionaries.

The Dictionary

The root of all spell checkers is a word list, aka the dictionary.  The dictionary contains a list of common words for a language.  For example, the US English dictionary that comes with this package contains 162,573 words.

When designing the dictionary for NetSpell I wanted the dictionary to be a single file, be as small as possible and load extremely fast. I experimented with many different ways to save and load the large word lists.  Techniques I tried ranged from a saved dataset to a binary serialized collection, all of which proved to be too slow. I ended up using the good old UTF8 text file. Loading and parsing a text file proved to be extremely fast.

Affix Compression

The first issue I wanted to tackle was the file size.  Any compression scheme would have to decompress really fast. I first tried using zip.  While the dictionary loaded in the one second range, it was still not fast enough to be used in a web environment.

My research into spell checkers turned up a popular technique called Affix Compression.  Affix Compression is using a base word and adding prefixes and suffixes to it to create other words.  The affix compression scheme was originally developed by Geoff Kuenning for his ISpell spell checker. The OpenOffice project expanded the Affix Compression scheme to simplify its rule definitions.  The NetSpell Affix implementation is largely based on the OpenOffice MySpell format.  Read the following link to better understand Affix Compression format.

As a result of using the OpenOffice dictionary format, NetSpell dictionaries are easily created from OpenOffice dictionaries.  This allows NetSpell to easily support additional languages.  The NetSpell download includes a dictionary build tool that allows you to create new dictionaries.  The build tool also allows you to import OpenOffice dictionaries and save them to the NetSpell format.

Dictionary Sections

To satisfy the goal of making the dictionary a single file, I needed a way to separate different sections of the file. This would allow for storing different types of data as a word list was not the only data need to be stored.  I decided to use the INI section format. I thought about using XML but XML carries a large weight in terms of file size because of the use of tags.  I ended up with the following sections in the file.

[Copyright]
The Copyright section contains any copyright information about the word list for the particular dictionary.

[Try]
The try section contains a sequence of letters that are used to try to make a correct word out of a misspelled word. They should be listed on a single line in order of character frequency (highest to lowest).  This section is used by the Near Miss Strategy discussed later.

[Replace]
The replace section contains a sequence of letter combinations that are often misspelled, for example ei and ie.  The data is entered in this section in a search characters space replace characters format.  The ei, ie example would look like this in the dictionary, "ei ie". This section is used by the Near Miss Strategy discussed later.

[Prefix]
The prefix section is used to define a set of affix rules for prefixes that can be attached to the beginning of a base word to form other words.  The format of these rules follows the same format as OpenOffice's affix files except the PFX is removed. You can read more about the OpenOffice affix format here

[Suffix]
The suffix section is used to define a set of affix rules for suffixes that can be attached to the end of a base word to form other words.  The format of suffix rules follows the same format as OpenOffice's affix files except the SFX is removed. You can read more about the OpenOffice affix format here

[Phonetic]
The phonetic section is optional and it contains a set of rules that define how to generate a phonetic code for a character or set of characters.  The phonetic code is generated using Lawrence Philips' Metaphone Algorithm that has been adapted to a table format by the ASpell project.  The NetSpell dictionary uses the same format that ASpell uses.  ASpell phonetic maps can be used directly by NetSpell.  See the following link to learn more about the ASpell phonetic code.

[Words]
The words section is the list of base words for the dictionary.  The format for this section is word/affix keys/phonetic code.  The affix keys and phonetic code portions are optional.  The affix keys portion indicates which affix rules apply to this word.  The phonetic code portion is a cache of the phonetic code for this word and is used by the phonetic suggestion strategy. 

Another important thing to know about dictionaries are that they are named to match the .NET Framework CultureInfo.Name property.  For example the US English dictionary is named "en-US.dic".  The en-US match the CultureInfo.Name property.  This allows the NetSpell library to default to the dictionary that corresponds to the computer's regional settings.

Spell Checking Text

Spell checking is normally performed by searching the dictionary for the given word.  Now that we've implemented affix compression, searching for the word became more complicated.  We have to create base words out of the given word. The process goes like this, first the base word list is searched for the given word.  If the word is not found in the base word list, the suffix rules are removed from the word.  After a suffix is removed, then the new word is checked to see if it is in the base word list.  If the word is still not found, the same process is repeated for the prefixes. If the word can't be found after removing the suffixes and prefixes, then the word is not found in the dictionary and is most likely misspelled.

Generating Suggestions

Once it has been determined that the word is misspell, we need to generate suggestions for the correct spelling of that word.  This is where the magic of a spell checker happens. NetSpell uses two different techniques to generate suggestions.  The first was developed by Geoff Kuenning for ISpell and is commonly called the near miss strategy. The second is Lawrence Philips' Metaphone Algorithm which returns a code that is a rough approximation of how an English word sounds.

Near Miss Strategy

The near miss strategy is a fairly simple way to generate suggestions.  Near miss takes the approach that the user didn't necessarily misspell the word but rather they mistyped it.  Two words are considered near if they can be made identical by inserting a blank space, interchanging two adjacent letters, changing one letter, deleting one letter or adding one letter. If a valid word is generated using these techniques, then it is added to the suggestion list. As you might have guessed, the near miss strategy doesn't provide the best list of suggestions when a word is truly misspelled.  That is where the phonetic strategy kicks in.

Phonetic Strategy

To understand the phonetic strategy, phonetic code needs to be defined.  A phonetic code is a rough approximation of how the word sounds.  Each character in the code represents a sound.  It's important to also understand that the phonetic code does not indicate how to pronounce the word; it's only a representation of how it sounds.

The phonetic strategy is comparing the phonetic code of the misspelled word to all the words in the word list.  If the phonetic codes match, then the word is added to the suggestion list.

While that process may sound strait forward, it becomes much more complicated when affix compression is introduced.  An affix compressed word list only contains base words.  We can't just compare the phonetic code of the misspelled word to the word list because the misspelled word may or may not be a base word.  To solve this issue, we remove all affix rules that pass the conditions of the rule from the misspelled word and add the resulting string to a possible base word list.  An important note to keep in mind is that the possible base word list is not a list of real words.  It is only a list of strings that can be made by removing the affix rules from the misspelled word.

Now that we have a list of possible base words from the misspelled word, we can generate the phonetic code on them and compare those codes with the list of base words.  If one of the codes matches the base word code, we add that word to the list of suggestion.  Since we removed all the affix keys and we compared only the base words, an expanded base word could be the correct word.  So, we expand the base word that matched by applying all the affix rules to get a list of all the possible words from that base word. We then add that list to the suggestion list.

Ranking Suggestions

Once we have a list of suggestions, we need some way to rank them by the most likely to be the correct spelling. My research into the best way to go about this turned up the Edit Distance Algorithm.  The edit distance is defined as the smallest number of insertions, deletions, and substitutions required changing one string into another. The NetSpell Edit Distance Algorithm implementation has one slight modification in that it adds an extra edit distance if the first character and last character don't match.  The rational behind this is that people generally can get the first character and last character correct when trying to spell a word.

Using the Library

To use the NetSpell Library in your project you simply add a reference to NetSpell.SpellChecker.dll to the project.  You can also add the library to the Visual Studio Toolbox to make it easier to interact with the properties. The library is event based so you have to handle the various events.  Also, if you set the ShowDialog property to true, the library will use its internal suggestion form to display the suggestion when a MisspelledWord event occurs.

The following code is a very simple implementation of the NetSpell library.

internal System.Windows.Forms.RichTextBox Document;
internal NetSpell.SpellChecker.Spelling SpellChecker;

// add event handlers

this.SpellChecker.MisspelledWord += 
                new NetSpell.SpellChecker.Spelling.MisspelledWordEventHandler(
                                            this.SpellChecker_MisspelledWord);
this.SpellChecker.EndOfText += 
                     new NetSpell.SpellChecker.Spelling.EndOfTextEventHandler(
                                                 this.SpellChecker_EndOfText);
this.SpellChecker.DoubledWord += 
                   new NetSpell.SpellChecker.Spelling.DoubledWordEventHandler(
                                               this.SpellChecker_DoubledWord);

private void SpellChecker_DoubledWord(object sender, 
                                 NetSpell.SpellChecker.SpellingEventArgs args)
{
    // update text

    this.Document.Text = this.SpellChecker.Text;
}

private void SpellChecker_EndOfText(object sender, 
                                    System.EventArgs args)
{
    // update text

    this.Document.Text = this.SpellChecker.Text;
}

private void SpellChecker_MisspelledWord(object sender, 
                                 NetSpell.SpellChecker.SpellingEventArgs args)
{
    // update text

    this.Document.Text = this.SpellChecker.Text;
}

// Start Spell Checking

SpellChecker.Text = this.Document.Text;
SpellChecker.SpellCheck();

The project download includes two example application for the NetSpell Library.  The first is a Windows forms text editor.  The second is a web project that demonstrats using the library in a web enviroment.

Conclusion

The NetSpell project has been a fun and challenging project to work on.  I plan to continue to improve and add new features to the library.  The feature that I'm currently working on is real time spell checking, like MS Word.  Please feel free to contact me with any suggestions, bug reports and feature request.

Paul Welter
http://www.loresoft.com/

References and Credits

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
Questiondictionary file type supported by the netspell checker?
snangliya
6:02 8 Feb '10  
hi,

I want to know which all are the dictionary file types supported by the NetSpell checker??
GeneralMultiple Text Boxes
shortdog
8:23 23 Oct '09  
Any one figure out a good way to use this project with multiple text boxes on the same form. Would like to call with seperate text strings or controls. I have been pondering modifying so it is not event based.
GeneralRe: Multiple Text Boxes
shortdog
6:04 26 Oct '09  
Well, since it doesn't appear anyone has tackled this (or tackled it and wish so share), I have modifed the Spelling.cs to be able to operate on multiple controls. It should work on any control that has the Text property. If anyone seems interested in this, I could write up an article.

Basically I added a Queue to the private variables that holds objects of System.Windows.Forms.Control type.
Then I created a new public method that allows me to add controls to that queue. Make sure that the method accepts a reference to the control because instead of using the events to update the text in the control, I am going to reassign the text directly to the control. In the calling code this method should be called for each control you want to spell check the text on, prior to calling the SpellCheck() method.

Next in the SpellCheck() Method (no parameter), I check to see if the count in the Queue is equal to zero, I call the SpellCheck function as it already being called (with the _wordIndex, this.WordCount-1 parameters). This will allow for backwards compatibility. If the count is greater than zero, I pop the control off the queue, use reflection to determine if it has a Text property. If so I get the text and assign it to the Spelling.Text property and call the SpellCheck function just as it was being called prior to modification.

Now on the OnDeletedWord and OnReplacedWord Event Notifiers, I test to see if I have a control that was popped of the queue, if so I assign the Spelling.Text property to the control Text property. I left the existing event notifiers in place for backwards compatibility.

The last bit was to modify the OnEndOfText event notifier. I added to code to see if the count in the queue is greater than zero. If so I call SpellCheck() again. Which will pop the next control off the queue and check spelling. In this way the EndOfText event will not be fired until all controls have been checked.

Note that using this method you would not use the events as described in this article. The events still fire, so you could do something else, but since we are modifying references to the controls inside the Spelling object, if you used the event to reassign the text, I think you could get some interesting results.

This is obviously not the only way to do this or maybe not even the best way, but it suits my needs. It could be improved upon, but using reflection to determine all the controls on a form and automatically call SpellCheck on them, but that was beyond the scope of my needs.
GeneralRe: Multiple Text Boxes
Dave Sheets
7:07 13 Nov '09  
I would be very interested in seeing this - I was thinking of doing the exact same thing with a bit of a twist; creating a dynamic richtextbox that would iterate 'x' times based on a returned recordset. So lets say I have 5 records, I would have 5 dynamic rtb that would be created using this control. I've been having some difficulty in getting it to work however. Can you provide a sample of your modification? TIA!
GeneralThread safety
Stonkie
8:00 2 Oct '09  
Hi,

First, thanks a lot for this great product! We are currently using it to spell check and generate suggestion for cities, states and even zip codes and postal codes! We had to hack it a little bit to support multiple words (simply replace all spaces by underscores in both dico and word to check), but since then, everything is working like a charm! Thumbs Up

Now, for my question! Big Grin

We are to use this in a web service and we intend to cache the spell checker objects and the dictionaries between requests (using lazy initialization). What I want to know is, what must I do to make this thread safe between page request? Can we reuse only the dictionaries and have a separate spell checker instance for each thread?

Thanks!
GeneralUnderlining word style
jammmie999
22:44 6 Jun '09  
Hi

Is it possible to implement a underlining of mis-spelt words like MS Word does

Thanks
GeneralRe: Underlining word style
Ant2100
6:17 16 Aug '09  
Any replies to this?? I kinda need the same thing :/

Check out my desktop conversion software for Windows -
www.universalconverter.net

GeneralRe: Underlining word style
Dave Sheets
7:09 13 Nov '09  
Ditto! This would be an awesome enhancement!
AnswerRe: Underlining word style
Ant2100
5:04 7 Feb '10  
Take a look at this! =)

It uses hunspell but is absolutely awesome!

NHunspellTextBoxExtender - A Spellchecking IExtenderProvider for TextBoxes using Hunspell for .NET[^]
Check out my unit conversion software for Windows -
www.universalconverter.net

GeneralHow do i make the spell checker window a modal window
Padoor Shiras
2:41 27 Apr '09  
I appreciate the kind of work you have done. I am not sure how do i make the window a modal dialog?

I am using 2.1.7.41329 version of NetSpell.SpellChecker.dll.
is this the latest version?
GeneralAdding NetSpell to an ASP.Net app with dynamically built pages
Dwight Johnson
6:46 10 Apr '09  
Wonderful tool!

I added NetSpell to an application where I dynamically build all the controls in each page. None of the demos show this.

1) I first downloaded the most current version from SourceForge (http://sourceforge.net/projects/netspell/). I unzipped it into a folder.

2) I added existing items to my app: SpellCheck.aspx, spell.css, and spell.js.

3) I added a reference in my app to NetSpell.SpellChecker.dll, which got copied into /Bin.

4) I created /Bin/Dictionary, and added the .dic files to it.

5) In the web.config, appSettings, I added an entry with key="DictionaryFolder" value="/Bin/Dictionary"
6) In the Page_Init of SpellCheck.aspx.cs, I had to make a slight change to find the Dictionary folder
The original had this:
folderName = this.MapPath(Path.Combine(Request.ApplicationPath, folderName)); I changed it to this:
folderName = this.MapPath("." + Path.Combine(Request.ApplicationPath, folderName));
7) In [MyPage].aspx.cs, I added "protected Button btnCheckSpelling", and did a FindControl for it (always needed when controls are added dynamically).

8) Also in [MyPage].aspx.cs, I added this line to add spell.js to the page:
StringBuilder script = new StringBuilder();
...
script.Append("<script language="\"JavaScript\"" src="\"spell.js\"" type="\"text/javascript\""></script>");
if (!ClientScript.IsStartupScriptRegistered("
onload"))
ClientScript.RegisterStartupScript(this.GetType(), "
onload", script.ToString());

9) In the class [MyPage].aspx.cs calls to dynamically create all its controls, just below where I create the text box I wanted to spellcheck, I added a button:
// btnCheckSpelling
row = new TableRow();
cell = new TableCell();
btn = new Button();
btn.ID = "btnCheckSpelling";
btn.Text = "Check Spelling";
btn.OnClientClick = "checkSpelling()";
cell.Controls.Add(btn);
row.Controls.Add(cell);

The key was the OnClientClick property, which said to run checkSpelling() in spell.js when the button was clicked.
GeneralRe: Adding NetSpell to an ASP.Net app with dynamically built pages
kal777
21:33 26 Apr '09  
Thank you so much. I followed your instruction and they worked like charm
GeneralRe: Adding NetSpell to an ASP.Net app with dynamically built pages
Dwight Johnson
3:36 27 Apr '09  
Glad to hear of your success.

Did you have the same issue of finding the folderName as I mentioned in step 6? I'm thinking it might relate to the way my particular project was structured. Just curious.

Dwight
GeneralRe: Adding NetSpell to an ASP.Net app with dynamically built pages
kal777
7:10 27 Apr '09  
No, I had my dll's in Bin folder. So, I added dic files and Dll to that folder. so, never had a problem finding the foldername.
Instead of adding the buttons and script in code behind, I included spell.js in the header section, and just used onclientclick event to the button in code behind.

I looked for steps on how to use netspell for more than an hour but only found the solution from you.
Thank you so much again,
kal
QuestionWant to use netspellchecker with openoffice arabic dictionary
wohpal
0:04 1 Apr '09  
As I am using netspell checker in my project and want to continue using this for spellcheking for arabic language but i m not able to use it.I have assigned arabic dictionary name in DictionaryFile property but i m not able to check spellings as it's not showing spellcheck window. I also tried to convert opensource arabic dictionary with NetSpell.DictionaryBuild.exe but after saving it, it shows junk character in dictionary file.

How can I do this?
GeneralFix for NetSpell not recognizing words that OpenOffice does [modified]
rubem
13:01 27 Feb '09  
Hi,

After a lot of testing, I think I found a fix for NetSpell not recognizing words that OpenOffice does. The reason is that NetSpell does not correctly process Affix entries that have a "0" (zero) as the third parameter, like in the sample below:

  [Suffix]
M Y 1 M x 0 axe

[Words]
axe/M
pickaxe/M
saxe/M
This entry should instruct the dictionary to remove the last "x" from words ending in "axe", but will fail in NetSpell. So in this example "ax", "pickax" and "sax" will be flagged as incorrect.

To solve this, in WordDictionary.cs (path: src\NetSpell.SpellChecker\Dictionary\), line 458, change:

entry.AddCharacters = partMatches[2].Value;
to

if(partMatches[2].Value != "0") entry.AddCharacters = partMatches[2].Value;
That's it! Hope this is useful for someone.

modified on Friday, February 27, 2009 6:17 PM

GeneralRe: Fix for NetSpell not recognizing words that OpenOffice does
Ant2100
2:25 17 Aug '09  
Why did someone voted this 2? Is this fix incorrect, or were they just having a bad day? Can someone please confirm if this fix works?

Thanks,

Anthony

Check out my desktop conversion software for Windows -
www.universalconverter.net

GeneralAlternative: NHunspell - OpenOffice Hunspell spell checker for .NET
Thomas Maierhofer
11:09 24 Feb '09  
I've ported the Hunspell spell checker library (OpenOffice, Firefox and Thunderbird) to .NET. Most of the OpenOffice dictionaries are LGPL, so you can use it in your Applications

Here is the project on Sourceforge:
http://nhunspell.sourceforge.net/


GeneralRe: Alternative: NHunspell - OpenOffice Hunspell spell checker for .NET
Uwe Keim
3:17 27 Jun '09  
Sounds nice!

Any examples on how to use, beside the 10-liner on the website?

My personal 24/7 webcamZeta Test - Intuitive, competitive Test Management environment for Test Plans and Test Cases. Download now!Zeta Producer Desktop CMS - Intuitive, very easy to use. Download now!

GeneralRe: Alternative: NHunspell - OpenOffice Hunspell spell checker for .NET
Thomas Maierhofer
3:12 16 Nov '09  
I've added a complete VS Sample with C# and Visual Basic.
NHunspell files with spell checking, hyphenation and thesaurus console and ASP.NET sample project(C# and Visual Basic) NHunspell spell checking, hyphenation and thesaurus class documentation (MSDN like)
Any suggestions are welcome.

Regards Thomas


GeneralHelp?
papaioannoua
2:49 30 Jan '09  
Im trying to use netspeller but i have a problem. How can i apply the spelling changes after the spelling dialog???/



Answer As soon as possible,
Alex
GeneralPossible Bug
jfarias
7:09 23 Jan '09  
I came across an issue when using NetSpell. I entered in 'the' twice with each one spelled wrong (teh teh). NetSpell pops up with a suggestion list for the first word. After clicking 'Replace' with the correct spelling it then displays the second word. After clicking 'Replace' again with the correct spelling it displays the previous corrected word with the correct spelling as a misspelled word and has no suggestions displayed. The word count is 2 but NetSpell acts as if there were 3. Just wanted to pass this along in case you haven't noticed it and wanted to fix it.

ps. I like what you did with NetSpell. Very good job. Thumbs Up
GeneralRe: Possible Bug
niki_4810
10:02 7 Aug '09  
Is there a way to fix this bug
GeneralVista compatibility
chiya
8:22 22 Dec '08  
Thanks for the very useful component. Does this work on VISTA
GeneralRe: Vista compatibility
Emmanuel Process
10:38 2 Apr '09  
Vista supports the .Net 2.0 Framework so in turn it supports the 2.0 version of this project.


Last Updated 22 Oct 2003 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010