Click here to Skip to main content
13,733,955 members
Click here to Skip to main content
Add your own
alternative version


19 bookmarked
Posted 29 Apr 2005

Tolerant string matching using the Levenshtein algorithm

, 29 Apr 2005
Rate this:
Please Sign up or sign in to vote.
A practical example how to use the Levenshtein algorithm for string matching

Sample image


Sometimes it's handy to have a tolerant string matching function which finds almost indentical strings. I needed it in our German receipt database RkSuite to manage the categories which are sometimes just wrongly spelled or which are variations of other categories.

Different algorithms exist to solve this problem, a very common program under Unix is soundex but it's only useful if you stick to one language. If you have English, German, French, ... words the soundex algorithm wouldn't work effectively so a more general approach was required. After some search with Google I heard about Levenshtein.


The code is based on the original work by Michael Gillel, visit his Homepage and read his excellent article about the Levenshtein algorithm. Come back and join me when I show you how to use your new knowledge in your own applications.

Using the code

The class Levenshtein (levenshtein.cpp, levenshtein.h) contains the code which calculates the edit distance between 2 words. The edit distance describes how many edit operations (insert, remove, change) are necessary to transform one string into another. The code uses no sophisticated tricks, any C++ compiler should be able to deal with it.

The method Get returns the edit distance. The plain value may or may not be interesting for you, in my example I use the formula Edit Distance * 100 / string length to tolerate more errors in longer strings.

Two different Get methods are available, the first works with strings with a maximum length of MAXLINE (default: 128). I use static arrays for the calculation which slightly improves the speed of the algorithm. You can change the constant to save memory. Please note that a two-dimensional array will be created which requires MAXLINE * MAXLINE * sizeof(int) bytes.

If you are not sure how long your strings are you can use the method Get2, it allocates the memory dynamically and can deal with (almost) any string length. If a string is too long for the algorithm above, the 2nd one is used automatically.

Usage is very simple:

// calculate Levenshtein distance
Levenshtein lev;
int nError = lev.Get(string1, string2);

// recalculate value
int nNormalizedError = 100 * nError / strlen(string1);

The code in the example uses a small list from the file liste.txt. You can play around with the tolerance setting to see which strings are recognized as similar. The demo project uses the MFC and has been compiled with Visual Studio .NET 2003. It should be easily possible to create a VC6 project, just add all .h, .cpp and .rc files to the project.

Final conclusion

The algorithm works very well for small sets of data (several 1000s). I have tried to use it for spell checking but comparing a string against a dictionary of usual size is not efficient. Anyway, it's incredible to switch from perfect to tolerant matching and your users will be glad that your program gives you a chance to make errors.


  • 2005-05-02 Fixed download link
  • 2005-04-29 First release for CodeProject


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Andreas Muegge
Web Developer
Germany Germany
No Biography provided

You may also be interested in...


Comments and Discussions

GeneralFIXED: Links to code added Pin
Andreas Muegge2-May-05 7:56
memberAndreas Muegge2-May-05 7:56 
GeneralAnd also Pin
Neville Franks29-Apr-05 13:04
memberNeville Franks29-Apr-05 13:04 
GeneralWhy I rated your article poorly Pin
Scott Everts29-Apr-05 7:20
memberScott Everts29-Apr-05 7:20 
QuestionOk...where's the code? Pin
cjbreisch29-Apr-05 4:36
membercjbreisch29-Apr-05 4:36 
AnswerRe: Ok...where's the code? Pin
icaro29-Apr-05 5:15
membericaro29-Apr-05 5:15 
GeneralRe: Ok...where's the code? Pin
peterchen29-Apr-05 7:26
memberpeterchen29-Apr-05 7:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web04-2016 | 2.8.180920.1 | Last Updated 29 Apr 2005
Article Copyright 2005 by Andreas Muegge
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid