Click here to Skip to main content
Click here to Skip to main content

Spell Check, Hyphenation, and Thesaurus for .NET with C# and VB Samples - Part 1: Single Threading

By , 16 Nov 2009
 

Introduction

Spell checking, hyphenation, and synonym lookup via thesaurus are the Open Office spell checker Hunspell functions. The NHunspell project makes these functions available for .NET applications. As the Open Office spell checker Hunspell is used in a vast amount of Open Source applications, it could also be the first choice for .NET applications. Beyond Open Office, Hunspell is currently used in the Mozilla applications Firefox and Thunderbird, the browsers Google Chrome and Opera, and last but not least, in the new Apple MAC OS/X 10.6 "Snow Leopard" Operating System.

Since the first steps (NHunspell - Hunspell for the .NET platform), NHunspell has improved a lot, and goes straight to the first release candidate. The current release 0.9.2 is a milestone because the support of Hunspell is nearly complete.

Using NHunspell spell check, hyphenation, and thesaurus in single threaded applications

NHunspell is designed to serve two different use cases: single threaded applications, like word processors and any other tool with a UI/GUI, and multi threaded applications like servers and web servers (ASP.NET).

This article covers the single threaded applications. They use the basic NHunspell classes Hunspell, Hyphen, and MyThes. The members aren't thread safe. If these classes are used by multiple threads, a synchronization mechanism like lock must be used. But NHunspell provides special multi-threading classes which are announced in the second part of the article: Spell checking, hyphenation, and thesaurus in multi-threading applications.

Spell checking: Hunspell

Hunspell objects have several possibilities to work with texts:

  • Spell check and suggestion for a misspelled word: with Spell() and Suggest()
  • Morphological analysis and word stemming: with Analyze() and Stem()
  • Generation (deriving a word from its stem, like girl => girls ) by sample: with Generate()

A C# sample of spell checking, suggestion, analysis, stemming, and generation with Hunspell:

using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
{
    Console.WriteLine("Hunspell - Spell Checking Functions");
    Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯");

    Console.WriteLine("Check if the word 'Recommendation' is spelled correct"); 
    bool correct = hunspell.Spell("Recommendation");
    Console.WriteLine("Recommendation is spelled " + 
       (correct ? "correct":"not correct"));

    Console.WriteLine("");
    Console.WriteLine("Make suggestions for the word 'Recommendatio'");
    List<string> suggestions = hunspell.Suggest("Recommendatio");
    Console.WriteLine("There are " + 
       suggestions.Count.ToString() + " suggestions" );
    foreach (string suggestion in suggestions)
    {
        Console.WriteLine("Suggestion is: " + suggestion );
    }

    Console.WriteLine("");
    Console.WriteLine("Analyze the word 'decompressed'");
    List<string> morphs = hunspell.Analyze("decompressed");
    foreach (string morph in morphs)
    {
        Console.WriteLine("Morph is: " + morph);
    }

    Console.WriteLine("");
    Console.WriteLine("Find the word stem of the word 'decompressed'");
    List<string> stems = hunspell.Stem("decompressed");
    foreach (string stem in stems)
    {
        Console.WriteLine("Word Stem is: " + stem);
    }

    Console.WriteLine("");
    Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'");
    List<string> generated = hunspell.Generate("girl","boys");
    foreach (string stem in generated)
    {
        Console.WriteLine("Generated word is: " + stem);
    }
}

A Visual Basic sample of spell checking, suggestion, analysis, stemming and generation with Hunspell:

Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
    Console.WriteLine("Hunspell - Spell Checking Functions")
    Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯")

    Console.WriteLine("Check if the word 'Recommendation' is spelled correct")
    Dim correct As Boolean = hunspell.Spell("Recommendation")
    Console.WriteLine("Recommendation is spelled " & (If(correct,"correct","not correct")))

    Console.WriteLine("")
    Console.WriteLine("Make suggestions for the word 'Recommendatio'")
    Dim suggestions As List(Of String) = hunspell.Suggest("Recommendatio")
    Console.WriteLine("There are " & suggestions.Count.ToString() & " suggestions")
    For Each suggestion As String In suggestions
        Console.WriteLine("Suggestion is: " & suggestion)
    Next

    Console.WriteLine("")
    Console.WriteLine("Analyze the word 'decompressed'")
    Dim morphs As List(Of String) = hunspell.Analyze("decompressed")
    For Each morph As String In morphs
        Console.WriteLine("Morph is: " & morph)
    Next

    Console.WriteLine("")
    Console.WriteLine("Find the word stem of the word 'decompressed'")
    Dim stems As List(Of String) = hunspell.Stem("decompressed")
    For Each stem As String In stems
        Console.WriteLine("Word Stem is: " & stem)
    Next

    Console.WriteLine("")
    Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'")
    Dim generated As List(Of String) = hunspell.Generate("girl", "boys")
    For Each stem As String In generated
        Console.WriteLine("Generated word is: " & stem)

    Next
End Using

Hyphenation: Hyphen

The use of Hyphen to hyphenate is straightforward. Just create a Hyphen object and call Hyphenate(). The HyphenResult allows simple and complex hyphenation with text replacements, like in the old German spelling, the hyphenation of 'ck' as 'k-k'. For further details, refer the documentation.

A C# sample of hyphenation with Hyphen:

using (Hyphen hyphen = new Hyphen("hyph_en_us.dic"))
{
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'"); 
    HyphenResult hyphenated = hyphen.Hyphenate("Recommendation");
    Console.WriteLine("'Recommendation' is hyphenated as: " + hyphenated.HyphenatedWord ); 
}

A Visual Basic sample of hyphenation with Hyphen:

Using hyphen As New Hyphen("hyph_en_us.dic")
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'")
    Dim hyphenated As HyphenResult = hyphen.Hyphenate("Recommendation")
    Console.WriteLine("'Recommendation' is hyphenated as: " & hyphenated.HyphenatedWord)
End Using

Finding synonyms with thesaurus: MyThes

With the thesaurus MyThes, it is quite easy to find synonyms for a given word or phrase. Just create a MyThes object and call Lookup().

Often, only stem forms of a word are part of the thesaurus dictionary. By providing a Hunspell object, your derived word like 'Girls' is stemmed to 'girl', and the synonyms are generated in the primary form like 'misses', 'women', 'females', and not 'miss', 'woman', 'female'. In combination with the stemming and generation functions of Hunspell, MyThes is really a Swiss knife in finding synonyms. The sample shows this feature, but you can also try it on the ASP.NET demonstration project: Spell Check, Hyphenation, and Thesaurus Online.

A C# sample of a synonym lookup in the thesaurus with MyThes:

using( MyThes thes = new MyThes("th_en_us_new.idx","th_en_us_new.dat"))
{
    using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
    {
        Console.WriteLine("Get the synonyms of the plural word 'cars'");
        Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().");
        Console.WriteLine("hunspell generates the plural forms " + 
                          "of the synonyms via Generate()");
        ThesResult tr = thes.Lookup("cars", hunspell);
        
        if( tr.IsGenerated )
            Console.WriteLine("Generated over stem " + 
              "(The original word form wasn't in the thesaurus)");
        foreach( ThesMeaning meaning in tr.Meanings )
        {
            Console.WriteLine();
            Console.WriteLine("  Meaning: " + meaning.Description );

            foreach (string synonym in meaning.Synonyms)
            {
                Console.WriteLine("    Synonym: " + synonym);

            }
        }
    }
}

A Visual Basic sample of a synonym lookup in the thesaurus with MyThes:

Using thes As New MyThes("th_en_us_new.idx", "th_en_us_new.dat")
    Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
        Console.WriteLine("Get the synonyms of the plural word 'cars'")
        Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().")
        Console.WriteLine("hunspell generates the plural forms " & _ 
                          "of the synonyms via Generate()")
        Dim tr As ThesResult = thes.Lookup("cars", hunspell)

        If tr.IsGenerated Then
            Console.WriteLine("Generated over stem " & _ 
               "(The original word form wasn't in the thesaurus)")
        End If
        For Each meaning As ThesMeaning In tr.Meanings
            Console.WriteLine()
            Console.WriteLine("  Meaning: " & meaning.Description)

            For Each synonym As String In meaning.Synonyms

                Console.WriteLine("    Synonym: " & synonym)
            Next
        Next
    End Using
End Using

Use in commercial applications and available Dictionaries

Due to the LGPL and MPL licenses, NHunspell can be used in commercial applications. It is allowed to link against the NHunspell.dll assembly in closed source projects. NHunspell uses the Open Office dictionaries; most of these dictionaries are available for free. The use of NHunspell in commercial / closed source applications is permitted.

Resources

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)

About the Author

Thomas Maierhofer
CEO MSE-iT
Germany Germany
Member
I'm the CEO of MSE-iT Reisebürosoftware, a software development company in Germany making travel agency accounting software.
 
We release some library stuff under GPL or LGPL, so everybody can use it. These are our Open-Source projects so far:
 
Spell Checking, Hyphenation and Thesaurus for .NET

.NET spell checker, hyphenation and thesaurus based on the Open Offlice spell checker Hunspell. Here is a life Demo: Spell check, hyphenation and thesaurus reference project for NHunspell on ASP.NET.
 
jQuery Plugins

jQuery Background Canvas Plugin Inserts a HTML5 CANVAS element behind any HTML element and allows to draw on it with JavaScript.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionNHunspell NuGet PackagememberThomas Maierhofer11 Jun '12 - 13:02 
NHunspell is now available as NuGet Package: http://nuget.org/packages/NHunspell.

QuestionMyThes: Not able to load dictionary if there has been one loaded beforememberkom200526 Sep '11 - 9:50 
Hi,
 
within my application I like the user to swith thesaurus dictionaries.
 
While I got this working for Hunspell objects for spell checking, I have no clue how to do that for a MyThes object:
 
public MyThes nhThesaurus;
nhThesaurus.Load(something); // works
nhThesaurus.Load(anotherone); // does not work, because it has loaded one already

nhThesaurus.Dispose(); // No dispose available to free the object
 
Any ideas on that?
 
Thanks!
QuestionExcellentmemberAlan Burkhart8 Sep '11 - 10:03 
I am SO GLAD I read this before trying to build my own spell checker. Smile | :)
XAlan Burkhart

QuestionPlural formmemberradu2029 Jun '11 - 23:53 
Hi,
 
Do you know if there is a way to detect if a word is a plural form?
For example for car I want to show only the synonyms that are plural. For example I don't want to show automobile only automobiles.
 
Thanks a lot,
Radu
AnswerRe: Plural formmemberThomas Maierhofer30 Jun '11 - 0:48 
this can eventually be done with the Stemming functions. Stemming delivers the word stem

AnswerRe: Plural formmemberThomas Maierhofer18 Mar '13 - 11:29 
You can use the stemming functions of Hunspell
 
Blog: <a href="http://blog.mse-it.de/">MSE-iT Software Development</a>
Homepage: <a href="http://www.maierhofer.de/">Thomas Maierhofer Software Developement</a>
Homepage: <a href="http://www.mse-it.de/">MSE-iT Reisebürosoftware (Travel Agency Accounting Software)</a>
Forum: <a href="http://www.reisebueroforum.de/">Reisebüro-Forum (Travel Agency Forum)</a>
GeneralHunspell Verion 0.9.5 releasedmemberThomas Maierhofer18 Jul '10 - 23:08 
You can download the new NHunspell version 0.9.5 from Sourceforge:
http://sourceforge.net/projects/nhunspell/files/
 
Best regards Thomas

QuestionAdd words? [modified]memberMember 45445582 Dec '09 - 2:40 
Thank you very much for this library.
It works well with my VB app.
I use it to get spelling suggestions and synonyms when I click on a word.
 
Is it possible to create custum dictionaries where you can add words,
so that it will work like MS Word's spell check function with the User.dic?
 
Thanks again
 
Peter
 
Added:
 
If HunSpell doesn’t use custom dictionaries, one way to do it is this:
 
Find the path to Word’s user.dic.
Read it into an ArrayList.
Check the word with HunSpell.
If HunSpell doesn’t know the word, go through the ArrayList.
If the word is there, set Correct to true. If it isn’t, set Correct to false
and add ‘Add word’ to the dropdown list with the suggestions from HunSpell.
If the user adds the word, then add it to the ArrayList and sort it.
Then save the ArrayList as user.dic.
 
Anyway that’s the way I did it because I want my user.dic to be kept up to date.
If you don’t have MS Word, you can make your own text file.
 
modified on Friday, December 4, 2009 1:48 AM

AnswerRe: Add words?memberThomas Maierhofer3 Dec '09 - 21:33 
You can use Add() and AddWithAffix() to add your words into a already created Hunspell object. The Dictionary files are not modified, so this addition must be done every time you create a Hunspell object. You can store your own dictionary whereever you want and add the words from your dictionary after you create a Hunspell object. After that you can spell check with your own words in the dictionary.
 

GeneralRe: Add words?memberMember 45445583 Dec '09 - 22:29 
Thank you very much. I hadn't thought of that.
I use Add now for 'Ignore All' because they aren't really added.
But I can see that I can use it for both purposes. Great.
I only have to change a couple of lines.
 
Have you got any plans for additions to NHunSpell?
 
Have a nice weekend
 
Peter

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 16 Nov 2009
Article Copyright 2009 by Thomas Maierhofer
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid