SpellCheck.net spell checking parsing using C#






4.90/5 (7 votes)
Jun 20, 2002
2 min read

179203

3544
Online spell checking using C# and regular expressions.
Introduction
SpellCheck.NET is free online spell checking site. Whenever I need to check my spelling I visit this site, so I decided to write a parser for this site. I wrote this parser with C# and wrapped it up in a DLL file and called it Word.dll. In this article I will show you how to parse a HTML page using regular expressions. I will not explain all the source code since it is available for download. My main purpose of this project is to demonstrate how to parse a HTML page using regular expressions.
Before this project I have never worked with regular expressions seriously, so I decided to use regular expressions. In this project I have learned a lot about C# regular expressions and .NET framework. The difficult part was in this project writing regular expressions pattern. So I referred to different sites and books to get the right pattern.
Here are some useful sites to check out.
About Word.dll
Word.dll has one public class and two public methods
- Public Class SpellCheck
Include "using Word.dll" at the top of file for the object reference.
SpellCheck word = new SpellCheck();
- Public Method CheckSpelling
This method will check the word and return true or false. If the word is correct then it will return true otherwise false.
bool status = false; status = word.CheckSpelling("a word");
- Public Method GetSpellingSuggestions
This method will return the collection of suggested words.
foreach(string suggestion in word.GetSpellingSuggestions("a word")) { System.Console.WriteLine( suggestion ); }
Parser Technique
- Connect to spellcheck.net site and pass the word.
- Look for the word "correctly." in html file, if found return true
- else look for the word "misspelled.", if found return false.
regular expression pattern @"(correctly.)|(misspelled.)"
- If the word misspelled found in html then look for the word "suggestions:"
regular expression pattern @"(suggestions:)"
- and parse the string between <blockquote>
regular expression pattern @"<blockquote>(?:\s*([^<]+) \s*)+ </blockquote>"
- and finally return the collection of suggested words.
C# code:
Source file is included in zip format for download.
Calling Word.dll wrapper class:
This is how you would call this wrapper class in your application.
using System; //Word.dll using Word; /// <summary> /// Test Harness for SpellCheck Class /// </summary> class TestHarness { /// <summary> /// testing Word Class /// </summary> [STAThread] static void Main(string[] args) { SpellCheck word = new SpellCheck(); bool status = false; string s = "youes"; Console.WriteLine("Checking for word : " + s ); // check to see if the word is not correct // return the bool (true|false) status = word.CheckSpelling(s); if (status == false) { Console.WriteLine("This word is misspelled : " + s); Console.WriteLine("Here are some suggestions"); Console.WriteLine("-------------------------"); foreach( string suggestion in word.GetSpellingSuggestions(s) ) { System.Console.WriteLine( suggestion ); } } else if (status == true) { Console.WriteLine("This word is correct : " + s ); } } }
Compiling:
Run the "compile.bat" file at the DOS prompt, it will create necessary files.
Output:
This is how your screen would look like after you execute TestHarness.exe file.