Click here to Skip to main content
13,150,344 members (30,369 online)
Click here to Skip to main content
Add your own
alternative version


75 bookmarked
Posted 22 Mar 2006

Fast, memory efficient Levenshtein algorithm

, 26 Mar 2012
Rate this:
Please Sign up or sign in to vote.
A version of the Levenshtein algorithm that uses 2*Min(StrLen1,StrLen2) bytes instead of StrLen1*StrLen2 bytes.


The Levenshtein distance is the difference between two strings. I use it in a web crawler application to compare the new and old versions of a web page. If it has changed enough, I update it in my database.


The original algorithm creates a matrix, where the size is StrLen1*StrLen2. If both strings are 1000 chars long, the resulting matrix is 1M elements; if the strings are 10,000 chars, the matrix will be 100M elements. If the elements are integers, it will be 4*100M == 400MB. Ouch!

This version of the algorithm uses only 2*StrLen elements, so the latter example would give 2*10,000*4 = 80 KB. The result is that, not only does it use less memory but it's also faster because the memory allocation takes less time. When both strings are about 1K in length, the new version is more than twice as fast.


The original version would create a matrix[6+1,5+1], my version creates two vectors[6+1] (the yellow elements). In both versions, the order of the strings is irrelevant, that is, it could be matrix[5+1,6+1] and two vectors[5+1].

The new algorithm


1Set n to be the length of s. ("GUMBO")
Set m to be the length of t. ("GAMBOL")
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct two vectors, v0[m+1] and v1[m+1], containing 0..m elements.
2Initialize v0 to 0..m.
3Examine each character of s (i from 1 to n).
4Examine each character of t (j from 1 to m).
5If s[i] equals t[j], the cost is 0.
If s[i] is not equal to t[j], the cost is 1.
6Set cell v1[j] equal to the minimum of:
a. The cell immediately above plus 1: v1[j-1] + 1.
b. The cell immediately to the left plus 1: v0[j] + 1.
c. The cell diagonally above and to the left plus the cost: v0[j-1] + cost.
7After the iteration steps (3, 4, 5, 6) are complete, the distance is found in the cell v1[m].

This section shows how the Levenshtein distance is computed when the source string is "GUMBO" and the target string is "GAMBOL":

Steps 1 and 2 


Steps 3 to 6, when i = 1


Steps 3 to 6, when i = 2

SWAP(v0,v1): If you look in the code you will see that I don't swap the content of the vectors but I refer to them.

Set v1[0] to the column number, e.g. 2.


Steps 3 to 6, when i = 3


Set v1[0] to the column number, e.g. 3.


Steps 3 to 6, when i = 4


Set v1[0] to the column number, e.g. 4.


Steps 3 to 6, when i = 5


Set v1[0] to the column number, e.g. 5.


Step 7

The distance is in the lower right hand corner of the matrix, v1[m] == 2. This corresponds to our intuitive realization that "GUMBO" can be transformed into "GAMBOL" by substituting "A" for "U" and adding "L" (one substitution and one insertion = two changes).


If you are sure that your strings will never be longer than 2^16 chars, you could use ushort instead of int, if the strings are less than 2^8 chars, you could use byte. I guess, the algorithm would be even faster if we use unmanaged code, but I have not tried it.



  • 2006-03-22
    • Version 1.0.
  • 2006-03-24
    • Detailed description of the algorithm. The code has been rewritten so that it now follows the description. :-)



This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Sten Hjelmqvist
Software Developer (Senior) SEB bank
Sweden Sweden
I work as a developer in the 'Risk control' department at SEB bank in Stockholm,Sweden and I have been designing software since the early 80's.

You may also be interested in...


Comments and Discussions

QuestionHelpful Pin
SusannePorte21-Nov-16 21:24
memberSusannePorte21-Nov-16 21:24 
AnswerRe: Helpful Pin
Sten Hjelmqvist22-Nov-16 5:37
memberSten Hjelmqvist22-Nov-16 5:37 
QuestionCleaned up the code Pin
John Henckel12-Sep-16 8:45
memberJohn Henckel12-Sep-16 8:45 
GeneralAwesome work! Pin
Member 1079435925-Sep-15 13:25
memberMember 1079435925-Sep-15 13:25 
GeneralRe: Awesome work! Pin
Sten Hjelmqvist28-Sep-15 20:49
memberSten Hjelmqvist28-Sep-15 20:49 
GeneralRe: Awesome work! Pin
Member 107943598-Oct-15 13:33
memberMember 107943598-Oct-15 13:33 
QuestionThank you very much! How can I get the detail change list? Pin
Member 1191077915-Aug-15 16:51
memberMember 1191077915-Aug-15 16:51 
BugResults are not accurate Pin
Tom Medhurst18-Jun-15 0:35
memberTom Medhurst18-Jun-15 0:35 
GeneralRe: Results are not accurate Pin
Sten Hjelmqvist5-Jul-15 20:53
memberSten Hjelmqvist5-Jul-15 20:53 
GeneralRe: Results are not accurate Pin
jfos28-Sep-15 5:39
memberjfos28-Sep-15 5:39 
QuestionWhat about the list of changes? Pin
UzairSyed10-Jun-15 3:41
memberUzairSyed10-Jun-15 3:41 
QuestionWhat about the transpose operation? Pin
Member 1164580029-Apr-15 19:45
memberMember 1164580029-Apr-15 19:45 
AnswerRe: What about the transpose operation? Pin
Sten Hjelmqvist1-May-15 1:14
memberSten Hjelmqvist1-May-15 1:14 
AnswerRe: What about the transpose operation? Pin
james.manning14-May-15 10:44
memberjames.manning14-May-15 10:44 
QuestionBug Pin
xidongzhang25-Nov-14 6:57
memberxidongzhang25-Nov-14 6:57 
AnswerRe: Bug Pin
Sten Hjelmqvist8-Dec-14 22:38
memberSten Hjelmqvist8-Dec-14 22:38 
GeneralMy vote of 5 Pin
Amir Mehrabi-Jorshari13-Nov-13 10:06
memberAmir Mehrabi-Jorshari13-Nov-13 10:06 
QuestionPerfect for my needs Pin
Greg Lindberg5-Nov-13 3:32
memberGreg Lindberg5-Nov-13 3:32 
AnswerRe: Perfect for my needs Pin
Sten Hjelmqvist5-Nov-13 4:01
memberSten Hjelmqvist5-Nov-13 4:01 
GeneralMy vote of 5 Pin
meys_online26-Aug-12 13:38
membermeys_online26-Aug-12 13:38 
QuestionThanks - very interesting. Pin
Simon Hill20-Jan-12 15:14
memberSimon Hill20-Jan-12 15:14 
AnswerRe: Thanks - very interesting. Pin
Sten Hjelmqvist25-Mar-12 23:40
memberSten Hjelmqvist25-Mar-12 23:40 
QuestionDid you port this code from somewhere? :) Pin
Mikael Svenson8-Sep-11 23:05
memberMikael Svenson8-Sep-11 23:05 
Looking at the lines:

if (Math.Max(sRow.Length, sCol.Length) > Math.Pow(2, 31))
    throw (new Exception("\nMaximum string length in Levenshtein.iLD is " + Math.Pow(2, 31) +
    ".\nYours is " + Math.Max(sRow.Length, sCol.Length) + "."));

it looks to me like this is ported code. Let's examine it:

Math.Pow(2, 31)
is the same as int.MaxValue.

Math.Max(sRow.Length, sCol.Length)
will return a value <= int.MaxValue as Length from a string is of type int.

Thus, the whole if condition can be removed as the left side will never be greater than the right side.

A pretty easy spot Smile | :)
AnswerRe: Did you port this code from somewhere? :) Pin
Sten Hjelmqvist9-Sep-11 0:39
memberSten Hjelmqvist9-Sep-11 0:39 
GeneralGreat stuff Pin
ed_ward_graham28-Feb-11 5:41
membered_ward_graham28-Feb-11 5:41 
QuestionOperation Performed on strings? Pin
Piyush Patell4-Oct-10 17:11
memberPiyush Patell4-Oct-10 17:11 
GeneralVery useful. Pin
OrganicHuman24-Aug-09 10:22
memberOrganicHuman24-Aug-09 10:22 
QuestionCommercial quality? Pin
Blake Johnston21-Nov-07 18:37
memberBlake Johnston21-Nov-07 18:37 
AnswerRe: Commercial quality? Pin
Sten Hjelmqvist21-Nov-07 22:23
memberSten Hjelmqvist21-Nov-07 22:23 
NewsImprovements Available [modified] Pin
cetinsert6-Nov-07 4:49
membercetinsert6-Nov-07 4:49 
GeneralRe: Improvements Available Pin
Marc Brooks28-Jul-08 12:49
memberMarc Brooks28-Jul-08 12:49 
GeneralRe: Improvements Available Pin
cetinsert7-Aug-08 0:15
membercetinsert7-Aug-08 0:15 
BugRe: Improvements Available [modified] Pin
hbrowser4-Jul-12 4:33
memberhbrowser4-Jul-12 4:33 
Generalattractive title Pin
v2.022-Mar-06 5:09
memberv2.022-Mar-06 5:09 
GeneralRe: attractive title Pin
aprenot22-Mar-06 7:27
memberaprenot22-Mar-06 7:27 
GeneralRe: attractive title Pin
Thanh Dao22-Mar-06 22:16
memberThanh Dao22-Mar-06 22:16 
GeneralRe: attractive title Pin
Sten Hjelmqvist26-Mar-06 23:18
memberSten Hjelmqvist26-Mar-06 23:18 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170924.2 | Last Updated 26 Mar 2012
Article Copyright 2006 by Sten Hjelmqvist
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid