Article

# A .NET implementation for the Knuth-Moris-Pratt (KMP) algorithm

By , 4 Apr 2009
 Rate this:

## Introduction

This article about the Knuth-Moris-Pratt algorithm (KMP). KMP is a string matching algorithm that allows you to search patterns in a string in O(n) time and O(m) pre-proccesing time, where n is the text length and m is the pattern length.

## Background

The KMP algorithm first calculates a transition array that tells how many shifts the pattern is shifted when a mismatch occurs.

## Implementation

The `PrefixArray` class takes a `string` parameter, the pattern, and is responsible for calculating the prefix function and returning an array that contains the transition indexes.

In calculating the prefix array, care has been taken to give maximum performance; hence, the general implementation has been tweaked in several places.

```for(int i = 1 ; i < pattern.Length ; i++)
{
temp        = SubCharArray(i, patternArray);
hArray[i]   = GetPrefixLegth(temp, firstChar);
}```

The code above shows computing the prefix function; the loop in the code iterates through the pattern and calculates the prefix function for each index. (Note: the iteration starts from 1 as we know that the transition at index 0 is 0.)

The `temp` array represents all the characters from the 0th index of the pattern to the index of the current loop. This character array is passed into the `GetPrefixLength()` function, which actually computes the prefix function.

Next, we have the code for the `GetPrefixLength()` function:

```private static int GetPrefixLegth(char[] array, char charToMatch)
{
for(int i = 2; i < array.Length; i++)
{
//if it is a match
if(array[i] == charToMatch)
{
if(IsSuffixExist(i, array))
{
//Return on the first prefix which is the largest
return array.Length - i;
}
}
}
return 0;
}```

This function takes in the array we discussed as a parameter, and also the first character of the pattern (`charToMatch`).

The array is iterated and searched for a match with the first character of the pattern (the first character needs to be matched, the prefix starts with the first character). If the first character exist in the array, then we calculate the longest suffix that is a prefix of the pattern.

Obviously, the first match gives us the longest suffix that is a prefix of the pattern.

The code below depicts the function `IsSufficExists()`:

```private static bool IsSuffixExist(int index, char[] array)
{
//Keep track of the prefix index
int k = 0;
for(int i = index; i < array.Length ; i++)
{
//A mismatch so return
if(array[i] != array[k]){ return false;}
k++;
}
return true;
}```

Next, we will look into the code snippet that actually finds the occurrence of the pattern in a string using the transition array for the pattern.

```for(int i = 0; i < charArray.Length; i++)
{
//If there is a match
if(charArray[i] == patternArray[k])
{
//Move the pattern index by one
k++;
}
else
{
//There is a mismatch..so move the pattern

//The amount to move through the pattern
int prefix = transitionArray[k];
//if the current char does not match
//the char in the pattern that concides
//when moved then shift the pattern entirley, so
//we dont make a unnecssary comparision
if(prefix + 1 > patternArray.Length &&
charArray[i] != patternArray[prefix + 1])
{
k = 0;
}
else
{
k = prefix;
}
}```

The loop iterates through the string to search the pattern. If the current character matches the character in the pattern index (the character index that should be matched in the iteration), then there is a match, and we increment the index of the pattern and continue.

If there is a mismatch, then we get the transition index for the specific index, and we see if the character at the transition index + 1 matches with the character that is being matched (this is done so we don’t need to unnecessarily match the character again). If there is a match, then we move the pattern index with the transition index; if not, we move the pattern index with 0.

Next, we see if the pattern index is equal to the pattern length; if so, then we have a match. This code snippet is shown below:

```//A complet match, if kis
//equal to pattern length
if(k == patternArray.Length)
{
//Set k as if the next character is a mismatch
//therefore we don’t miss out any other containing pattern
k  = transitionArray[k - 1];
}```

## Using the code

In order to use the code, all you got to do is build the files in the source provided, and use the static method `GetAllOccurences(string pattern, string text)` of the `KMPUtil` class like this:

`KMPUtil.GetAllOccurences("ab", "abhsdsabsbabaa");`

This method will return an `ArrayList` of indexes where the pattern occurs in the text (zero based).

Software Developer (Senior)
Sri Lanka
I am a Senior Software Engineer at Virtusa, Sri Lanka, I have an IT degree from University of Colombo and currently reading for my Master in Computer Science at the same University.

 First Prev Next
 Cheers Member 8270977 27-Sep-11 3:59
 this code is bugged. rivermsfly 11-Jun-11 18:20
 Wrong Output kantele84 9-Mar-10 4:54
 Re: Wrong Output Nairooz Nilafdeen 2-May-10 5:23
 Comparison with other approaches Daniel Vaughan 5-Apr-09 1:05
 ArrayList? Dmitri Nesteruk 4-Apr-09 23:49
 Why not return an `IEnumerable` instead?
 Re: ArrayList? Nairooz Nilafdeen 5-Apr-09 2:53
 Last Visit: 31-Dec-99 18:00     Last Update: 10-Mar-14 8:45 Refresh 1