13,194,617 members (50,919 online)
alternative version

#### Stats

93.1K views
121 bookmarked
Posted 1 Jun 2009

# Fuzzy Search

, 2 Jun 2009
 Rate this:
A simple implementation of the fuzzy string search.

## Introduction

This article describes a simple implementation of the fuzzy (string) search. It can be used for approximate string matching (for more information, see http://en.wikipedia.org/wiki/Fuzzy_string_searching).

Other algorithms for approximate string searching exist (e.g., Soundex), but those aren't as easy to implement. The algorithm in this article is easy to implement, and can be used for tasks where approximate string searching is used in an easy way.

A `List<string>` is used for searching, and therefore it's quite easy to search a database.

## Background

The algorithm used the Levenshtein-distance for determining how exact a string from a word list matches the word to be found. Information about the Levenshtein-distance can be found at http://en.wikipedia.org/wiki/Levenshtein_distance.

## Using the code

The following example will show how simply the class can be used.

```static void Main(string[] args)
{
string word = "Code Project";
List<string> wordList = new List<string>
{
"Code Project",
"Code project",
"codeproject",
"Code Projekt",
"Kode Project",
"Other Project"
};

List<string> foundWords = FuzzySearch.Search(
word,
wordList,
0.70);

foundWords.ForEach(i => Console.WriteLine(i));
}```

Output:

```Code Project
Code project
codeproject
Code Projekt
Kode Project```

## Implementation

A basic approach is shown. Instead of the Levenshtein-distance, a more optimized algorithm could be used - but here, a quite simple implementation is given for clarity reasons.

### Levenshtein-distance

For computing the Levenshtein-distance, I use the following algorithm:

```public static int LevenshteinDistance(string src, string dest)
{
int[,] d = new int[src.Length + 1, dest.Length + 1];
int i, j, cost;
char[] str1 = src.ToCharArray();
char[] str2 = dest.ToCharArray();

for (i = 0; i <= str1.Length; i++)
{
d[i, 0] = i;
}
for (j = 0; j <= str2.Length; j++)
{
d[0, j] = j;
}
for (i = 1; i <= str1.Length; i++)
{
for (j = 1; j <= str2.Length; j++)
{

if (str1[i - 1] == str2[j - 1])
cost = 0;
else
cost = 1;

d[i, j] =
Math.Min(
d[i - 1, j] + 1,              // Deletion
Math.Min(
d[i, j - 1] + 1,          // Insertion
d[i - 1, j - 1] + cost)); // Substitution

if ((i > 1) && (j > 1) && (str1[i - 1] ==
str2[j - 2]) && (str1[i - 2] == str2[j - 1]))
{
d[i, j] = Math.Min(d[i, j], d[i - 2, j - 2] + cost);
}
}
}

return d[str1.Length, str2.Length];
}```

### The Searching

In the search process, for each word in the wordlist, the Levenshtein-distance is computed, and with this distance, a score. This score represents how good the strings match. The input argument `fuzzyness` determines how much the strings can differ.

```public static List<string> Search(
string word,
List<string> wordList,
double fuzzyness)
{
List<string> foundWords = new List<string>();

foreach (string s in wordList)
{
// Calculate the Levenshtein-distance:
int levenshteinDistance =
LevenshteinDistance(word, s);

// Length of the longer string:
int length = Math.Max(word.Length, s.Length);

// Calculate the score:
double score = 1.0 - (double)levenshteinDistance / length;

// Match?
if (score > fuzzyness)
}
return foundWords;
}```

#### LINQ-variant

This piece of code could be written in LINQ too.

```public static List<string> Search(
string word,
List<string> wordList,
double fuzzyness)
{
// Tests have prove that the !LINQ-variant is about 3 times
// faster!
List<string> foundWords =
(
from s in wordList
let levenshteinDistance = LevenshteinDistance(word, s)
let length = Math.Max(s.Length, word.Length)
let score = 1.0 - (double)levenshteinDistance / length
where score > fuzzyness
select s
).ToList();

return foundWords;
}```

## History

• 2009 June 1st: Initial release.

## Share

 Software Developer (Senior) Foidl Günther Austria
Engineer in combustion engine development.
Programming languages: C#, FORTRAN 95, Matlab

FIS-overall worldcup winner in Speedski (Downhill) 2008/09 and 2009/10.

## You may also be interested in...

 First Prev Next
 qwestion scheee17-Jan-13 1:42 scheee 17-Jan-13 1:42
 Parallelize may help but it is quite limited Member 95751485-Nov-12 18:29 Member 9575148 5-Nov-12 18:29
 My vote of 5 Petr Abdulin18-Apr-12 19:36 Petr Abdulin 18-Apr-12 19:36
 My vote of 5 Arlen Navasartian9-Apr-12 8:26 Arlen Navasartian 9-Apr-12 8:26
 My vote of 5 manoj kumar choubey26-Feb-12 21:32 manoj kumar choubey 26-Feb-12 21:32
 Works Great ! damon8810-Dec-09 19:27 damon88 10-Dec-09 19:27
 Re: Works Great ! Günther M. FOIDL29-Dec-09 13:33 Günther M. FOIDL 29-Dec-09 13:33
 About dictionary Win32nipuh21-Oct-09 21:54 Win32nipuh 21-Oct-09 21:54
 Re: About dictionary Günther M. FOIDL22-Oct-09 10:55 Günther M. FOIDL 22-Oct-09 10:55
 great explanation Win32nipuh8-Jun-09 20:29 Win32nipuh 8-Jun-09 20:29
 Re: great explanation Günther M. FOIDL9-Jun-09 14:31 Günther M. FOIDL 9-Jun-09 14:31
 Re: great explanation coduresearch2-Sep-09 7:45 coduresearch 2-Sep-09 7:45
 Re: great explanation Günther M. FOIDL10-Sep-09 6:22 Günther M. FOIDL 10-Sep-09 6:22
 Re: great explanation Nicolas Dorier21-Oct-09 8:28 Nicolas Dorier 21-Oct-09 8:28
 Searching large text daniel.zolnjan4-Jun-09 5:56 daniel.zolnjan 4-Jun-09 5:56
 Re: Searching large text Günther M. FOIDL4-Jun-09 9:25 Günther M. FOIDL 4-Jun-09 9:25
 I would also have used extension methods to make my code more readable Moshe Plotkin1-Jun-09 14:06 Moshe Plotkin 1-Jun-09 14:06
 Re: I would also have used extension methods to make my code more readable Günther M. FOIDL1-Jun-09 21:46 Günther M. FOIDL 1-Jun-09 21:46
 Linq does not seam to be slower [modified] Moshe Plotkin1-Jun-09 13:55 Moshe Plotkin 1-Jun-09 13:55
 Re: Linq does not seam to be slower Günther M. FOIDL1-Jun-09 21:44 Günther M. FOIDL 1-Jun-09 21:44
 Re: Linq does not seam to be slower Moshe Plotkin2-Jun-09 2:15 Moshe Plotkin 2-Jun-09 2:15
 Re: Linq does not seam to be slower Günther M. FOIDL2-Jun-09 3:14 Günther M. FOIDL 2-Jun-09 3:14
 Re: Linq does not seam to be slower riskka10-Jun-09 2:28 riskka 10-Jun-09 2:28
 more optimized algorithm? Unruled Boy1-Jun-09 3:11 Unruled Boy 1-Jun-09 3:11
 Re: more optimized algorithm? Günther M. FOIDL1-Jun-09 3:22 Günther M. FOIDL 1-Jun-09 3:22
 Re: more optimized algorithm? gstolarov10-Jun-09 12:18 gstolarov 10-Jun-09 12:18
 Re: more optimized algorithm? Psycho_Coder22-May-14 19:10 Psycho_Coder 22-May-14 19:10
 Last Visit: 31-Dec-99 18:00     Last Update: 19-Oct-17 3:28 Refresh 1