Levenshtein Edit Distance Algorithm






4.97/5 (16 votes)
Implement the Levenshtein Edit Distance algorithm.
Introduction
This source code is for understanding the Levenshtein Edit Distance algorithm easily. When I began to search this algorithm, I was so scared to implement it. I read all of the articles about this algorithm and I really didn't understand anything. When my research was deepening, I was trying to figure out the secret of this common algorithm. When I finished implementing it, I decided to write an article on CodeProject.com. I am hoping my implementation would be useful for people who work to understand The Levenshtein Edit Distance algorithm.
Background
This algorithm is so commonly used for Microsoft Outlook or Microsoft Word Spell Checking systems and Google. So thanks Vladimir Levenshtein for this algorithm. I will not try to explain the algorithm's details to you. The purpose of this article is for you to follow steps and see the flow chart of algorithm easily.
About the Algorithm
The edit distance between two strings is defined as the minimum number of edit operations required to transform one string into another. Let’s assume that the first string is named as the target string and the second string is named as the source string. We want to convert the source string to target. An edit is one of three operations: a delete (a character from the source string), an insert (a character from the target string), and a substitute (a character from the source string with a character from the target string). There is a fourth operation, named as match, which does not count as an edit. Consider two input strings "activate" and "caveat". Below you can see one possible transformation. In the example, a transformation is represented by a string consisting of the characters M for match, S for substitute, D for delete, and I for insert.
D M D S M I M M D
a c t i v a t e
c a v e a t
Using the code
I will share some code snippets of my implementation here:
In our implementation:
- Delete cost = 0.7
- Insert cost = 0.7
- Substitute cost = 1.0
Match does not have a cost.
Reference = Ref_text_box.Text;
Hypotheses = Hyp_text_box.Text;
distance_table = new double[Reference.Length + 1, Hypotheses.Length + 1];
for (int i = 0; i <= Reference.Length; i++)
distance_table[i, 0] = i * 0.7;
for (int j = 0; j <= Hypotheses.Length; j++)
distance_table[0, j] = j * 0.7;
for (int i = 1; i <= Reference.Length; i++)
{
for (int j = 1; j <= Hypotheses.Length; j++)
{
if (Reference[i - 1] == Hypotheses[j - 1])//if the letters are same
distance_table[i, j] = distance_table[i - 1, j - 1];
else //if not add 1 to its neighborhoods and assign minumun of its neighborhoods
{
distance_table[i, j] = Math.Min(Math.Min(distance_table[i - 1, j - 1] + 1,
distance_table[i - 1, j] + 0.7), distance_table[i, j - 1] + 0.7);
}
}
}
//create table
//Compute Distance
String distance_string = " ";
String distance_result;
int i = Reference.Length, j = Hypotheses.Length;
while (true)
{
this.dataGridView1.Rows[i].Cells[j+1].Style.BackColor = Color.LightGreen;
if (i == 0 && j == 0)
{
this.dataGridView1.Rows[i].Cells[j].Style.BackColor = Color.LightGreen;
distance_string_textbox.Text = distance_string;
char[] a = distance_string.ToCharArray();
Array.Reverse(a);
distance_result = new string(a);
distance_string_textbox.Text = distance_result;
break;
}
else if (i == 0 && j > 0)
{
this.dataGridView1.Rows[i].Cells[j+1].Style.BackColor = Color.LightGreen;
distance_string += "d";//delete
j--;
}
else if (i > 0 && j == 0)
{
this.dataGridView1.Rows[i].Cells[j+1].Style.BackColor = Color.LightGreen;
distance_string += "i";//insert
i--;
}
else
{
if (distance_table[i - 1, j - 1] <= distance_table[i - 1, j] &&
distance_table[i - 1, j - 1] <= distance_table[i, j - 1])
{
this.dataGridView1.Rows[i].Cells[j+1].Style.BackColor = Color.LightGreen;
if (distance_table[i - 1, j - 1] == distance_table[i, j])
distance_string += "m"; //match
else
distance_string += "s"; //substitue
i--;
j--;
}
else if (distance_table[i - 1, j] < distance_table[i, j - 1])
{
this.dataGridView1.Rows[i ].Cells[j+1].Style.BackColor = Color.LightGreen;
distance_string += "i"; //insert
i--;
}
else if (distance_table[i, j - 1] < distance_table[i - 1, j])
{
this.dataGridView1.Rows[i].Cells[j +1].Style.BackColor = Color.LightGreen;
distance_string += "d"; //delete
j--;
}
}
}
Points of Interest
This code snippet will teach to understand the Levenshtein Edit Distance algorithm easily. You can try some words and sentences to compare each other and you can follow the distance table to understand how to compute the distance of your strings.