13,199,852 members (67,549 online)
alternative version

#### Stats

30K views
25 bookmarked
Posted 4 Apr 2009

# A .NET implementation for the Knuth-Moris-Pratt (KMP) algorithm

, 4 Apr 2009
 Rate this:
A .NET implementation for the Knuth-Moris-Pratt (KMP) algorithm.

## Introduction

This article about the Knuth-Moris-Pratt algorithm (KMP). KMP is a string matching algorithm that allows you to search patterns in a string in O(n) time and O(m) pre-proccesing time, where n is the text length and m is the pattern length.

## Background

The KMP algorithm first calculates a transition array that tells how many shifts the pattern is shifted when a mismatch occurs.

## Implementation

The `PrefixArray` class takes a `string` parameter, the pattern, and is responsible for calculating the prefix function and returning an array that contains the transition indexes.

In calculating the prefix array, care has been taken to give maximum performance; hence, the general implementation has been tweaked in several places.

```for(int i = 1 ; i < pattern.Length ; i++)
{
temp        = SubCharArray(i, patternArray);
hArray[i]   = GetPrefixLegth(temp, firstChar);
}```

The code above shows computing the prefix function; the loop in the code iterates through the pattern and calculates the prefix function for each index. (Note: the iteration starts from 1 as we know that the transition at index 0 is 0.)

The `temp` array represents all the characters from the 0th index of the pattern to the index of the current loop. This character array is passed into the `GetPrefixLength()` function, which actually computes the prefix function.

Next, we have the code for the `GetPrefixLength()` function:

```private static int GetPrefixLegth(char[] array, char charToMatch)
{
for(int i = 2; i < array.Length; i++)
{
//if it is a match
if(array[i] == charToMatch)
{
if(IsSuffixExist(i, array))
{
//Return on the first prefix which is the largest
return array.Length - i;
}
}
}
return 0;
}```

This function takes in the array we discussed as a parameter, and also the first character of the pattern (`charToMatch`).

The array is iterated and searched for a match with the first character of the pattern (the first character needs to be matched, the prefix starts with the first character). If the first character exist in the array, then we calculate the longest suffix that is a prefix of the pattern.

Obviously, the first match gives us the longest suffix that is a prefix of the pattern.

The code below depicts the function `IsSufficExists()`:

```private static bool IsSuffixExist(int index, char[] array)
{
//Keep track of the prefix index
int k = 0;
for(int i = index; i < array.Length ; i++)
{
//A mismatch so return
if(array[i] != array[k]){ return false;}
k++;
}
return true;
}```

Next, we will look into the code snippet that actually finds the occurrence of the pattern in a string using the transition array for the pattern.

```for(int i = 0; i < charArray.Length; i++)
{
//If there is a match
if(charArray[i] == patternArray[k])
{
//Move the pattern index by one
k++;
}
else
{
//There is a mismatch..so move the pattern

//The amount to move through the pattern
int prefix = transitionArray[k];
//if the current char does not match
//the char in the pattern that concides
//when moved then shift the pattern entirley, so
//we dont make a unnecssary comparision
if(prefix + 1 > patternArray.Length &&
charArray[i] != patternArray[prefix + 1])
{
k = 0;
}
else
{
k = prefix;
}
}```

The loop iterates through the string to search the pattern. If the current character matches the character in the pattern index (the character index that should be matched in the iteration), then there is a match, and we increment the index of the pattern and continue.

If there is a mismatch, then we get the transition index for the specific index, and we see if the character at the transition index + 1 matches with the character that is being matched (this is done so we don’t need to unnecessarily match the character again). If there is a match, then we move the pattern index with the transition index; if not, we move the pattern index with 0.

Next, we see if the pattern index is equal to the pattern length; if so, then we have a match. This code snippet is shown below:

```//A complet match, if kis
//equal to pattern length
if(k == patternArray.Length)
{
//Add it to our result
result.Add(i - (patternArray.Length - 1));
//Set k as if the next character is a mismatch
//therefore we don’t miss out any other containing pattern
k  = transitionArray[k - 1];
}```

## Using the code

In order to use the code, all you got to do is build the files in the source provided, and use the static method `GetAllOccurences(string pattern, string text)` of the `KMPUtil` class like this:

`KMPUtil.GetAllOccurences("ab", "abhsdsabsbabaa");`

This method will return an `ArrayList` of indexes where the pattern occurs in the text (zero based).

## About the Author

 Software Developer (Senior) Sri Lanka
I am a Senior Software Engineer at Virtusa, Sri Lanka, I have an IT degree from University of Colombo and currently reading for my Master in Computer Science at the same University.

 Pro

## Comments and Discussions

 First Prev Next
 not match thangnt8415-Apr-14 23:37 thangnt84 15-Apr-14 23:37
 i'm try with - Partern: "aabhsdsab" Target: "abhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsb abaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaaabhsdsabsbabaa" Plz, Debug with me ???
 The order of prefix function calculation algorithm is quadratic Amit_Rastogi28-Jan-12 8:47 Amit_Rastogi 28-Jan-12 8:47
 Cheers Member 827097727-Sep-11 3:59 Member 8270977 27-Sep-11 3:59
 this code is bugged. rivermsfly11-Jun-11 18:20 rivermsfly 11-Jun-11 18:20
 Wrong Output kantele849-Mar-10 4:54 kantele84 9-Mar-10 4:54
 Re: Wrong Output Nairooz Nilafdeen2-May-10 5:23 Nairooz Nilafdeen 2-May-10 5:23
 Comparison with other approaches Daniel Vaughan5-Apr-09 1:05 Daniel Vaughan 5-Apr-09 1:05
 ArrayList? Dmitri Nesteruk4-Apr-09 23:49 Dmitri Nesteruk 4-Apr-09 23:49
 Re: ArrayList? Nairooz Nilafdeen5-Apr-09 2:53 Nairooz Nilafdeen 5-Apr-09 2:53
 Last Visit: 31-Dec-99 18:00     Last Update: 22-Oct-17 17:33 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.