Click here to Skip to main content
Click here to Skip to main content

A Spell Checking Engine

By , 5 Feb 2001
 
  • Download demo project - 85 Kb
  • Download the current US English Dictionary - 820 Kb
  • Description:

    This project is an evolution of the spell checking engine project I submitted earlier. This project includes numerous enhancements to the core spelling engine plus the addition of a "check-as-you-type" edit control and the related support dialogs (see above).

    This project is not complete, it is a work-in-progress. There are numerous issues with the current version which need to be addressed. My long term goal is to develop this to "commercial quality".

    I am going to continue to improve this engine toward my goal. I will continue to post updates as I feel necessary.

    Changes from previous version:

    1. Reorganized class architecture. Added CFPSSpellCheckEngineOptions, CFPSDictionary to core engine.
    2. Created CFPSSpellingEditCtrl CEdit derived class to implement "check-as-you-type" edit control.
    3. Created options property pages (see above)
    4. Created spelling dialog (see above)
    5. Created common-use dictionary. (Download available above)
    6. Updated US English dictionary w/improved word list + proper names.
    7. Incremental changes to the MetaphoneEx function.
    8. Addition of the EditDistance function.
    9. Added support for case-sensitive dictionary entries.
    10. Added file header to dictionaries.

    To-do-list:

    1. Re-write EditDistance algorithm for performance.
    2. Research compression options for dictionaries
    3. Create ATL ActiveX control from edit control
    4. Create COM based dictionary support for language independence
    5. Create Rich Edit "check-as-you-type" control.
    6. Add auto-correct capability.
    7. Add sentence begin recognition for automatic upper case decisions.
    8. Continue to improve US English dictionary.
    9. Implement a binary-search mechanism on dictionary look-ups.
    10. Continue to improve MetaphoneEx function.
    11. Create C# (.net) version (when C# stabilizes)
    12. ETC...

    Classes:

    CFPSSpellCheckEngine This is the core spelling engine. It is intended to be language independent. (not currently). This engine encapsulates the functionality of managing dictionaries, making suggestions (through dictionaries), and maintaining spelling options.
    CFPSSpellCheckEngineOptions Support class for CFPSSpellCheckEngine which implements support for storing, saving and loading spell checking options. Currently, this uses a serialized file to store options, but could easily be changed to INI file or registry.
    CFPSDictionary Base dictionary class. Defines a set of virtual functions generic to all dictionaries. Also, provides base implementation of all virtual functions based on current requirements. This class uses a defined file structure w/ a file header and any number of dictionary records. Future derivations of this class will provide language specific support.
    CDlgSpellChecker CDialog derived class which implements the spell checker dialog. Currently the undo support is based on edit control undo support, not spell checker undo. Need to improve this further.
    CPrShtSpellOptions CPropertySheet derived class which implements the spell checking engines property sheet.
    CPrPgeSpellOptions_General CPropertyPage derived class implementing general options panel.
    CPrPgeSpellOptions_User CPropertyPage derived class implementing user dictionary options panel.
    CPrPgeSpellOptions_Common CPropertyPage derived class implementing common misspellings options panel.

    Support functions of importance:

    void CheckSpellingEdit (CFPSSpellCheckEngine* pEngine, CEdit* pEdit)
    This function is called when a user presses the F7 (or configured) hot key from within the "check-as-you-type" edit control. It displays the CDlgSpellChecker dialog box.
    void CheckSpellingRich (CFPSSpellCheckEngine* pEngine, CRichEditCtrl* pEdit)
    This function is called when a user presses the F7 (or configured) hot key from within the "check-as-you-type" edit control. It displays the CDlgSpellChecker dialog box.

    NOTE: This function is not currently being used because the rich edit control is not complete.

    int EditDistance(const char *szWord1, const char *szWord2) This function is passed in 2 words and returns an approximation of the minimum number of changes a user would need to make to make the 2 words match. This function is not a true edit-distance algorithm, but is a customized algorithm for this spell checking application.
    void MetaphoneEx(const char *szInput, char *szOutput, int iMaxLen) This function is passed a word and it returns (through the szOutput parameter) a modified-metaphone representation of the word. This is a variation on the algorithm originally wrote by Lawrence Philips. A newer version of his algorithm (double-metaphone) is also available. I have tested this algorithm with the spell checking engine and was not impressed with the results. It does provide fast results and a high hit-rate, but it also returns far too many results (on average). However, I am considering using it in conjunction with the EditDistance algorithm and will further review this.
    void SortMatches(LPCSTR lpszBadWord, CStringList &Matches) This function sorts a list of word suggestions based on the approximate edit-distance between the words in the list and the misspelled word based in as lpszBadWord.

    Architecture:

    CORE ENGINE

    The core spell checking engine consists of the three classes: CFPSSpellCheckEngine, CFPSSpellCheckEngineOptions and CFPSDictionary. These classes provide support for dictionary related functions such as add a word, remove a word, ignore a word, load dictionary, save dictionary, Is a word in the dictionary, suggest possible matches, etc.

    The core engine is implemented as a strict back-end engine. It has no user-interface components. Most of the functions exposed by these classes where an error might occur return an int return code. These return codes are defined in 1) FPSSpellCheckerInclude.h and 2) the header file for a given class. The return codes should always be examined to determine the completion status of these functions.

    Special care has been taken to insure that these classes are very stable and robust. Also, performance considerations weigh heavy on the implementation of these classes. Very little MFC code is used in these classes and functions.

    CHECK-AS-YOU-TYPE EDIT CONTROL

    The check-as-you-type edit control is contained in the CFPSSpellingEditCtrl class. It is derived off of CEdit and works by subclassing an existing edit control through the AttachEdit function.

    To improve performance, this control implements a timer and whenever there is no user activity (typing, mouse clicking, scrolling, etc) checks the spelling of the displayed portion of the edit control. The function RedrawSpellingErrors is called to perform the checking. It checks only the displayed portion of the edit control and calls DrawSpellingError for each displayed word. If a word is not found in the dictionary, this function calls DrawSquiglyI to draw the squigly underline for the word. DrawSquigly creates a structure of type FPSSPELLEDIT_ERRORS and adds it to the m_SpellingErrors member list.

    The OnRButtonDown function checks the m_SpellingErrors to determine when to display the normal popup menu and when to display the spell check popup menu. Suggestions returned from the core engine are sorted using the SortMatches function to display them in order of edit-distance.

    The PreTranslateMessage checks for a hot key (defaults to F7). This can be customized by calling the SetHotKey static member function. When the hot key is pressed the CheckSpellingEdit function is called to display the spell checking dialog box.

    SPELL CHECK DIALOG BOX

    The spell checking dialog box is implemented in the CDlgSpellChecker class. This is a standard CDialog derived class based on the IDD_SPELL_CHECK dialog resource.

    The spell checking dialog is modelled after the Microsoft Word implementation of spell checking. It is laid out the same and functions (for the most part) the same. This dialog searches an edit control (or rich edit control) for sentences misspelled words and displays the sentence with the misspelled word highlighted.

    Suggestions returned from the core engine are sorted using the SortMatches function to display them in order of edit-distance.

    How to use the demo:

    1. Unzip the provided file into a directory (be sure to extract the sub directories.)
    2. Make sure that the USMain.dic file is in the \Release directory.
    3. Make sure that the USCommon.dic file is in the \Release directory.
    4. Execute the FPSSpellChecker.exe from the \Release directory.

    How to incorporate the spell checker into an application:

    1. In your applications InitInstance function, add a call to CFPSSpellingEditCtrl::InitSpellingEngine(NULL) static member function; OR, instead of NULL, pass in a string containing a fully qualified path to a spell checking engine options file.
    2. In your applications ExitInstance function, add a call to CFPSSpellingEditCtrl::Terminate static member function
    3. Add the following files to your project.
      DlgSpellChecker.cpp DlgSpellChecker.h
      DlgSpellingEditCtrl.cpp DlgSpellingEditCtrl.h
      FPSDictionary.cpp FPSDictionary.h
      FPSSpellCheckEngine.cpp FPSSpellCheckEngine.h
      FPSSpellCheckEngineOptions.cpp FPSSpellCheckEngineOptions.h
      FPSSpellCheckerInclude.cpp FPSSpellCheckerInclude.h
      FPSSpellingEditCtrl.cpp FPSSpellingEditCtrl.h
      PrPgeSpellOptions_Common.cpp PrPgeSpellOptions_Common.h
      PrPgeSpellOptions_General.cpp PrPgeSpellOptions_General.h
      PrPgeSpellOptions_User.cpp PrPgeSpellOptions_User.h
      PrShtSpellOptions.cpp PrShtSpellOptions.h
    4. Copy the following resource items to your project.
      IDD_SPELL_CHECK
      IDD_SPELL_OPTION_COMMON
      IDD_SPELL_OPTION_GENERAL
      IDD_SPELL_OPTION_USER
    5. Include the "FPSSpellCheckerInclude.h" file in your stdafx.h file.
      #include "FPSSpellCheckerInclude.h"
    6. Place a standard edit control on a form or dialog resource and give it a unique control id (ie. ID_TEST_EDIT)
    7. Add a member variable of type CFPSSpellingEditCtrl to the dialog/form class file (ie. m_editTest)
    8. In the OnInitDialog function, call the AttachEdit member function of CFPSSpellingEditCtrl (ie. m_editTest.AttachEdit(this, ID_TEST_EDIT);

    Known Issues

    1. Performance is still not as good as it needs to be.
    2. Language support is limited to US English.
    3. The EditDistance function needs work.
    4. The MetaphoneEx function needs work.
    5. There is a painting problem with the edit control when scrolling the control while the spelling error "squigly" lines are displayed.
    6. No complete support for rich edit control.

    License

    This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

    A list of licenses authors might use can be found here

    About the Author

    Matt Gullett
    Web Developer
    United States United States
    Member
    No Biography provided

    Sign Up to vote   Poor Excellent
    Add a reason or comment to your vote: x
    Votes of 3 or less require a comment

    Comments and Discussions

     
    You must Sign In to use this message board.
    Search this forum  
        Spacing  Noise  Layout  Per page   
    GeneralRe: Commercial version releasedsussAnonymous12 Sep '04 - 21:03 
    Here's a more robust spell engine with many language dictionaries available: http://www.wintertree-software.com[^]
    GeneralRe: Commercial version releasedmemberThomas Holz14 Feb '11 - 0:38 
    The website is offline. No commercial version any more?
     
    Are the any other dictionaries available? German version perhaps? Because I'm still looking for a better one for my auto-correction software.
     
    (Sure, Hunspell might be an alternative, but is quite difficult to handle with the different dictionary licenses...)
    QuestionCan I use your dictionary on my school project?membertomc7 Apr '03 - 16:23 
    Is it copyrighted? Where can I get permission? Thanks
    AnswerRe: Can I use your dictionary on my school project?memberMatt Gullett7 Apr '03 - 16:28 
    Yes, you can use it. I am the copyright holder.
     
    If you do use it, please give me credit where appropriate.
    GeneralNEED OF UK DICTIONARYmemberlaksammu27 Dec '02 - 0:21 
    Hai,
    Got your code for Spell Checking Engine(FPS SpellCheck Engine) from Code Project.com It is working fine & perfect. Now that my need is to have the same code for UK Dictionary. Can you mail me the UK Dictionary similar to US Dictionary so that I can use it instead of US Dictionary? I tried different options but in vain. Expecting a positive mail from you soon.
     
    Thanks.

     
    LaksAmmu
    GeneralHelp NeededsussPushparaj12 Aug '02 - 23:35 
    Hi
    I'm using the same spell checker. If the font size and the word is an incorrect one, the line, which comes under the incorrected word, is not drawing properly. If the font size is bigger than the default font, the line is drawing on the word only. SO pls help me how to solve the problem.
    Thanks
    Pushparaj
    GeneralDouble Metaphone and Related AlgosmemberSuresh Limaye17 Apr '02 - 9:56 
    I want to understand these algos ? Are these available in the web with code examples Can I get latest version of the application ? I am interested only in Algo Big Grin | :-D
    GeneralRe: Double Metaphone and Related AlgosmemberMike Nordell17 Apr '02 - 10:06 
    Then I think you should have a look at aspell/pspell by Kevin Atkinson. I think it's hosted at SourceForge, but a Google search should tell you.
    GeneralRe: Double Metaphone and Related AlgosmemberMatt Gullett18 Apr '02 - 0:26 
    Thanks for the interest.
     
    Sorry for the delay in my response, but I am not getting notification emails off this article for some reason.
     
    If I'm not mistaken, Lawerence Phillips developed the original Metaphone and Double Metaphone algorithms. There is a good article on CUJ about it. Here is the link:
     
    http://www.cuj.com/articles/2000/0006/0006d/0006d.htm?topic=articles
     
    (One note about this article, the MString class provided does not perform well w/high volume systems.)
     
    There are numerous implementations of the metaphone algorithm many have small variations on the original. Most are used for spell checkers or similar apps.
     
    Suresh Limaye wrote:
    Can I get latest version of the application ?
     
    Sorry, I don't have an updated version of this available at the momeent. My current "work-in-progress" for this project is quite involved. I am converting the core system to COM, adding language support, improving performance and matching (modified double metahone, etc.). Also, I am implementing a web-interface for the checker. It will probably be at least a couple of months before I have a version of this ready to post to CodeProject.

    GeneralOther languagesmemberPBC17 Mar '02 - 6:31 
    Where I can get dictionaries in other languages?Confused | :confused: Dead | X|
     
    PBC
    GeneralRe: Other languagesmemberMatt Gullett18 Apr '02 - 0:31 
    Thanks for the interest.
     
    Sorry for the delay in my response, but I am not getting notification emails off this article for some reason.
     
    I am not sure where is the best place to find word lists for the various languages. The few that I have to work with have been scavenged from various places.
     
    You probably know this already, but, this particular version of the spell checker will probably not work very well with non-english languages without at least some modifications. I am working on an update to this project (major update), but it will be a couple of months before it is ready.
    GeneralRe: Other languagesmemberHockey7 Oct '04 - 23:14 
    Speaking of dictionary word lists Smile | :)
     
    Would you mind if I used your dictionary file in a PHP project??? My host offers PSpell, but no dictionary files... Blush | :O
     
    So i'm stuck having to find a dictionary list...and yours is the only text version and thorough enough for my needs...
     
    Matt Gullett wrote:
    few that I have to work with have been scavenged from various places.
     
    Does this mean if you say no, I have to scour various dictionary sources? What confuses me is...how then is it decided one has breached copyright laws. So long as you don't copy a database verbatim and only one term here and another term there...it's ok???
     
    Cheers Smile | :)
     
    How do I print my voice mail?
    GeneralTrouble installing...memberCless Averin13 Dec '01 - 23:45 
    I like the look of the spell checker you've made and would love to include it in my program, but I'm ... well kinda a moron sometimes. I'm currently having a little trouble adding it to my program, see my program is an SDI (CEdit derived), so I can't seem to add member variables to the document. Any help would be greatly appreciated. Thank you in advance.
    QuestionHow do I get change notifications back?memberGary A. Lucero13 Dec '01 - 10:11 
    In my main dialog where the CEdit control is, I have an OnChange() member function. I use it to tell me that the contents of the edit box have changed.
     
    Now that FPSSpellingEditCtrl steals OnChange(), how do I get this notification?
     
    Thanks,
    Gary A. Lucero
    glucero@swcp.com
    http://freebask.homestead.com
    AnswerRe: How do I get change notifications back?sussAdam Partridge2 Sep '02 - 4:26 
    Change the CEdit derived class's ON_CONTROL_REFLECT to ON_CONTROL_REFLECT_EX and return FALSE from the CEdit derived class's OnChange handler - see MSDN for information on ON_CONTROL_REFLECT_EX.
    Generalhelpmemberofir6 Dec '01 - 21:35 
    Hi everyone ,
     
    I have two questions :
     
    1.   I downloaded this project yesterday and when I try to run it in debug mode (in order to examine the files) I get ASSERTs all the time . Does anybody have a clue way ? (I have VC6 and win98)
     
    2.   Does somebody know where can I find material about spell checksers and the way they work ? (any material would help)
     
    Thanks in advance

     
    ofir
    GeneralRe: helpsusssandslash4 Oct '02 - 10:10 
    Hi there,
    I also had assert problems when running in debug. It appears that the file USUser.dic that is in the Release folder is not in the Debug folder. Copy the file USUser.dic from the Release folder to the Debug folder and it works. Thanks to the author for liberal use of the TRACE function!
     
    --steve Roll eyes | :rolleyes:
    GeneralMemory Problemmemberbbunting2230 Nov '01 - 23:17 
    Treminate() causes a heap corruption problem. When the app exits you get a heap overrun message. Any idea or fix?
    GeneralRe: Memory Problemmemberbbunting2230 Nov '01 - 23:31 
    I guess the problem only occurs if you don't provide an option file to InitSpellingEngine() i.e.
    InitSpellingEngine(NULL) causes the problem but
    InitSpellingEngine(szFileName) is OK.
     
    ...?
    GeneralBIG bug when in vertical scroll edit boxmemberkris_pl29 Nov '01 - 17:44 
    Just in case you weren't aware of this, when a vertical scroll edit box has misspelled text that is out of view, the squiggly lines are still drawn off the edit box.
     
    Any idea how I can quickly fix that, if you haven't already?
     
    thanks!
     
    kp
     
    what?
    Generallittle fix to 'BIG' bugmemberkris_pl7 Dec '01 - 11:06 
    add these lines to drawsquigly
     
    CRgn rgn;

    rgn.CreateRectRgnIndirect(&ClientRect);
    dc.SelectClipRgn(&rgn);
     

    void CFPSSpellingEditCtrl::DrawSquigly(CDC &dc, int iLeftX, int iWidth, int iY)
    {
    int iCurrentY = iY;
    int iCurrentX = iLeftX;
     
    CRect ClientRect;
    GetClientRect(ClientRect);
    /////////////////////
    CRgn rgn;

    rgn.CreateRectRgnIndirect(&ClientRect);
    dc.SelectClipRgn(&rgn);
    //////////////////////
    while (iCurrentX <= iLeftX + iWidth)
    {
    dc.MoveTo(iCurrentX, iY);
    if (iCurrentX+2 <= iLeftX + iWidth && iY+2 < ClientRect.Width())
    dc.LineTo(iCurrentX+2, iY+2);
    if (iCurrentX+4 <= iLeftX + iWidth && iY+2 < ClientRect.Width())
    dc.LineTo(iCurrentX+4, iY);
     
    iCurrentX += 3;
    }
    }
     
    what?
    GeneralAppend to ViewmemberJonny Newman18 Oct '01 - 7:51 
    HI, how would I have the spell checker engine running for a non-dialog based app. i.e. One using full Doc/View.
    Nice work on this demo app! Smile | :)
    GeneralCRichEditViewmember-= Matt Newman =-20 Sep '01 - 10:52 
    How would you use it in CRichEditView?
     
    -Matt Newman Suspicious | :suss:
    GeneralRe: CRichEditViewmemberMatt Gullett21 Sep '01 - 3:46 
    I have been working on this version and it is a little more involved than the CEdit implementation. Basically, you must disable updates, use SetSel to select each word, check it and when you are done reset the selection to the original position and enable updates. Where it gets complicated is calculating where to draw the squigly lines. You have to get the charformat, create a compatible DC, determine the text height and from that calculate the position of the line. This can be problematic when a single word uses different fonts for different characters.
     
    I am glad to see that someone is using or at least looking at this project. Unfortunately, due to my workload I have not been able to invest much time in completing this spell checker. I am hopefull that an upcoming project will require a spell checker and I will be able to finish it then.

    GeneralRe: CRichEditViewmember-= Matt Newman =-21 Sep '01 - 16:07 
    Ya I was kind of hoping I could use it in CRichEditView but I guess I will just have to do some work on it, my current project is kind of boring me.
     
    -Matt Newman Suspicious | :suss:
    GeneralRe: CRichEditViewmemberGeorge Clarence7 Dec '01 - 2:17 
    I have the same question. So, if you find any thing, please tell me.
     
    Best regards
    George ClarenceEek! | :eek:
    GeneralRe: CRichEditViewmemberGeorge Clarence7 Dec '01 - 2:19 
    Sory, my right Email address is george_clarence@yahoo.co.ukEek! | :eek:
    GeneralFastermemberAnup Joshi25 Jul '01 - 17:28 
    Make your engine much faster and try to make it an ocx
    GeneralApp crashesmemberAnonymous7 Feb '01 - 20:42 
    Application crashes on close. Bug is in following function:
     
    void CFPSSpellingEditCtrl::Terminate()
    {
    if (m_pEngine)
    {
    ASSERT(m_pEngine->GetUserDic());
    ....
     
    ASSERT above notifies error in debug mode, but there is no protection code that will take care of error. After ASSERT above you should add:
     
    if(m_pEngine->GetUserDic())
    {
     
    }
     
    Regards,
    Miroslav Rajcic
    http://www.spacetide.com
    GeneralRe: App crashesmemberMatt Gullett8 Feb '01 - 1:29 
    Thanks for the note. It did not occur on my test PC because there is always a user dic. I will fix the problem ASAP.
    GeneralRe: App crashesmemberAnonymous17 Apr '01 - 17:34 
    LOL guess not ASAP. Luckily I debugged it and ignored the assert error the 1st time through..
    QuestionWhat languages are desired?memberMatt Gullett7 Feb '01 - 14:49 
    As the developer of this project, I am interested in knowing what languages the users of this site would like to see. Also, I am in need of word lists for the various languages.
     
    Thank You,
     
    Matt Gullett
    AnswerRe: What languages are desired?memberBirch1 Apr '01 - 11:03 
    I may be able to get some word lists for you. Send me an email (link below).
     
    I work with languages that have never been written, so I'm very interested in this project. I am having trouble finding the Common Speller API. MS seems to deny any knowledge of it.
     

    Birch

    AnswerRussianmemberPavel Chuchuva13 Jun '01 - 17:44 
    Russian, please. Smile | :)
    AnswerRe: What languages are desired?memberJeremy Falcon17 Jun '04 - 13:01 
    I'd like to see:
     
    Mexican Spanish
    Spain Spanish
    Japanese
    Simplified Chinese
    Portuguese
     
    After that, I'd be able to use this in a app we have at work. I'd give you word lists for each language, but I haven't the first clue about finding them.
     
    Jeremy Falcon
    GeneralDouble MetaphonememberWilliam E. Kempf10 Jan '01 - 9:51 
    There's a newer algorithm by the author of Metaphone known as the Double Metaphone. You can find it at http://www.cuj.com/archive/1806/feature.html. I don't know how it will compare to your modified Metaphone, but you shoud check it out.
    GeneralRe: Double MetaphonememberMatt Gullett11 Jan '01 - 2:32 
    Thanks for the URL. Actually, I have already been researching the double-metaphone algorithm and I intend to implement it into the spelling engine. Some other things I have learned, though:
     
    1) Some commercial spell checkers also use a word-reduction algorithm (which keeps some vowels) to augment their search results. I have been looking at how to implement such a routine as well.
     
    2) At least MS Word (and probably others) also have developed a database of human-created word-reduction and metaphone outputs. These human-created outputs are used in thier dictionaries as opposed to the computer-created ones to provide a better output. I have already gone through my USEnglish dictionary and hand-coded many words with the letter 'G' in them.
     

    GeneralOther LanguagesmemberUwe Keim7 Jan '01 - 20:27 
    Would it be enough to replace the dictionaries (e.g. english => german) to use your class in another language, or is the algorithmn written around "english words"?
     
    Uwe Keim
    See me: http://www.zeta-software.de/~uwe
    GeneralRe: Other LanguagesmemberMatt Gullett8 Jan '01 - 1:49 
    The matching engine uses numerous "english specific" algorithms to enhance the reuslt list. I do not know much German, so I am not sure how well the engine will map to German. It would be worth a try, though. Progably the #1 function in the whole class which would need modification for various languages is the MetaphoneEx function. I think this function could be modified to work for German.
    GeneralRe: Other Languagesmemberreal name8 Jan '01 - 2:58 
    finish english first, but know:
    english is too easy comparing to german and especialy (eastern)-europe/slavic-and-others languages
     
    generaly one big difference is in english is one word for all circumstancies
    in german there are 4, we (s) 7 object-word sub-kinds;
    in english you say: of word, about word, with word
    we say: zo slova, o slove, so slovom
     
    similar for another word kinds (i do not try name them in english):
    in english you have green, in mine: zeleny (he), zelana (she), zelene (it), ... (similar in german: something like gruner, grune, grunes)
    in english more green/greener (?, stupidity, take as example only), we have zelensi; most green - najzelensi
    and combinations: about green word - o zelenom slove (german: um grune wort (?!))
     
    sometimes (very) regular, sometimes not
    etc. etc.
    knowing this complicated rules you can eliminate many duplicate/similar cases to keep concrete database smaller
     
    keep smiling and finish english first
    t!
    GeneralRe: Other LanguagesmemberWoR8 Jan '01 - 4:18 
    I guess english is the only western language (the only ones I can talk about) where a spell-checker with out grammer-check makes sense.
    In all other languages you would probably first reduce a word to its pre-/suffixless root(s) spellcheck root(s), check if root(s) support the pre-/suffixes and then recombine.
    Multiple roots occur in languages like german (which allows allmost free combination of many words into one, a feature which is very commonly used up to three words (the combinatorics start numbers getting big here Confused | :confused: )).
    Roots would in generally not be unique (suffixes like -s -es, prefixes like a- an-).
    The suffixes are mostly grammer implied and make for a good part of the spelling errors.
    Suffixes of different words must match (or rather the implied grammatic entities).
    Grammar only can decide if a specific word is noun adjective or verb (with nouns capitalized in german).
     
    So an 'english' spellchecker could be used to check the roots, with some code added for the restSuspicious | :suss: .
     
    Wolfgang Reichl
    GeneralRe: Other LanguagesmemberAnonymous8 Jan '01 - 23:35 
    In my opinion you might do following to improve the code:
    - separarate language dependent parts from language independent
    - break entire code into more than one class (CDictionary abstract class, CMainDictionary, CUserDictionary, CDictionaryIndex, ...)
    - create some kind of standard dictionary format with header fields (language, creator, ...) , indexing, compression (finding common word suffixes, gzip)
    - create application for conversion of worlists into dictionary format
     
    I think your project have big potential and lots of us are willing to help you to create something really big from it.
    Also I have lot of wordlists of different languages, so contact me if you are interested to publish them.
     
    Regards,
    Miroslav Rajcic
    http://www.spacetide.com

    GeneralRe: Other LanguagesmemberSjoerd van Leent7 Feb '01 - 6:57 
    OK, so the problem is not the Spell Checker, but the languages. As I'm coming from Holland, I know a saying in English: Double Dutch, so it is. Like the big brother of Dutch: German, Dutch has words which are male, female, multiple or no-gender. In German, you've got articles like:
     
    Der, Des, Dem, Den, Die which all do mean: THE
    and
    Das, Der, Die which all do mean: IT
     
    In Dutch it's more English-Like: "De" and "Het" for IT and "De" only for THE, but if you whant to use a prefix, to get a word have a more tiny sound, you must use "Het" in any case, even if it has a gender.
     
    I think, if someone wants to create a new language, the english classes are obsolete. I think the solution is to create different classes with words in it like:
     
    CGenderMale, CGenderFemale, CMultiple and CNoGender. Also CNoun, and CSuffix (Whick can be language specific, e.g.
     
    Class CSuffix
    {
    if (Gender == "Male")
    {
    AddSuffix("er")
    }
    else if {Gender == "Female")
    .
    .
    .
    }
     
    And so on. Also another class, for something specialy in english would be CWordPartCount. In this class You can set something as:
     
    if (Count < 2)
    {
    DoSuffix();
    }
    else if (Count = 2)
    {
    DoSuffixAndMoreOrLess();
    }
    else
    {
    DoMoreOrLess();
    }
     
    Get the idea? Now, if a word has only one part, only Suffixes are displayed e.g. Cool (Cooler and Coolest). For a word with two parts the Suffixes and Prefixes are shown e.g. Crazy (Crazier but olso More Crazy) and the last, when a words has 3 parts or more, only Prefixes are shown e.g. Pathetic (More pathetic, Most pathetic)
     
    Get the drill?
     
    And there are numerous classes needed for feeding information about whether the word is Irregular or not, if it's a verb or not and so on. So I think a wide discussion is needed to get the "Perfect" Spell Checker.
     
    CString Dutch = "Double Dutch";
    GeneralRe: Other LanguagesmemberPeter Sjöström7 Feb '01 - 12:31 
    Alright, this is great so far. I'd love to use this in our commercial project, I'd love to help develop the software at no cost providing we can us it in our commercial software. BUT, it's lacking language support, as mentioned. Now, previous writers wants some European languages, even eastern European, but we'd require world wide support including Thai, Chinese etc and preferrably also some functionality for translation dictionary (?). Meaning there is an english text and you want to get suggestions for translated words in a second language.
     
    How does this sound? Currently I think we would not want to dig into this because it's too far off at the moment.
     
    http://www.cavena.com

    GeneralRe: Other LanguagesmemberMatt Gullett7 Feb '01 - 13:00 
    Any support on this project is appreciated. I have been working on it now for about two months in my spare time and there is still a great deal to do. I have been doing some research on non-english language support, and I have learned a great deal about it. I am comfortable stating that when finished it will support multiple european languages. I have not researched languages like chineese but I know that there would be signifigant requirements to make it work.
     
    That said, I am a professional deverloper doing this on the side and I would not feel good recomending this project for a commercial product at this time. The amount of time required to complete it and the probably availibility of comparable existing products would probably lead me to look for an off-the-shelf solutions.
     
    The concept of suggesting words in another language has come up before. From what I have learned, it is doable, but requires very good and exhaustive dictionaries with information on word usage, sentence patterning, etc.

    General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

    Permalink | Advertise | Privacy | Mobile
    Web03 | 2.6.130516.1 | Last Updated 6 Feb 2001
    Article Copyright 2001 by Matt Gullett
    Everything else Copyright © CodeProject, 1999-2013
    Terms of Use
    Layout: fixed | fluid