Click here to Skip to main content
13,297,266 members (70,877 online)
Click here to Skip to main content
Add your own
alternative version


56 bookmarked
Posted 20 Jan 2006

Boyer-Moore and related exact string matching algorithms

, 20 Jan 2006
Rate this:
Please Sign up or sign in to vote.
Implementations of Boyer-Moore and several related string matching algorithms in C# 2.0.


This article presents a straightforward implementation of the Boyer-Moore string matching algorithm and a number of related algorithms.


When searching for one string within another, .NET programmers normally reach for String.IndexOf. For the vast majority of applications, that's just fine. However, when searching very long strings, there are other algorithms that deliver more performance than the basic algorithm that .NET provides. This article gives simple implementations of several of those algorithms.

It's worth noting that String.IndexOf performs a much more complex comparison than these algorithms provide, since String.IndexOf performs a culture-sensitive comparison - this is a highly non-trivial matter when a case insensitive match is desired! If you need culture sensitive matching, stick to String.IndexOf.

What this article doesn't cover

This article does not attempt to teach you how to write high performance string matching algorithms, nor does it try to explain the algorithms that are presented. If you're looking for that kind of information, consult an algorithms book such as Knuth or Cormen, et al. An excellent online reference for string matching algorithms is the Handbook of Exact String Matching Algorithms.

Using the code

The BoyerMooreSearch class provides implementations of the original Boyer-Moore algorithm plus a number of related algorithms. An implementation that uses String.IndexOf is also provided for comparison.

The text to be found is passed to the BoyerMooreSearch constructor. Each searching algorithm is presented using a common IEnumerable<int> model.

namespace StringMatch
  public class BoyerMoore
    /// <summary>
    /// Constructor
    /// </summary>
    /// <param name="pattern">Pattern for search</param>
    public BoyerMoore(string pattern) { ... }

    /// <summary>
    /// Return all matched of the pattern
    /// in the specified text using algorithm name
    /// </summary>
    /// <param name="text">text to be searched</param>
    /// <param name="startingIndex">Index at 
    ///       which search begins</param>
    /// <returns>IEnumerable which returns
    /// the indexes of pattern matches</returns>
    public IEnumerable<int> AlgorithmNameMatch(string text, 
                                 int startingIndex) { ... }

    /// <summary>
    /// Return all matched of the pattern in the
    /// specified text using algorithm name
    /// </summary>
    /// <param name="text">text to be searched</param>
    /// <returns>IEnumerable which returns the indexes of pattern matches</returns>
    public IEnumerable<int> AlgorithmNameMatch(string text) { ... }

Each algorithm returns an IEnumerable<int> that yields all of the indexes where the pattern (passed to the constructor) is found in the text (passed to the algorithm). This design allows the pre-computation that all of the Boyer-Moore related algorithms make use of to be shared across any number of searches for a specific pattern.

BoyerMoore bm = new BoyerMoore("abcd");
foreach (int index in bm.BoyerMooreMatch("abcdsflkjwertabc" + 
         "dfglkjdfgcdjweertopabjsdvacbwer" + 
  Console.WriteLine("Matched at index {0}", index);

This example will list matches at 0, 13, and 72.

The Algorithms

The BoyerMoore class implements the following algorithms:

  • BCLMatch - search using String.IndexOf.
  • HorspoolMatch - search using the Horspool algorithm. This is functionally identical to the "Quick Search" algorithm and is presented in some algorithms books as being the Boyer-Moore algorithm. Due to its simplicity, this is often the highest performing of the Boyer-Moore-like algorithms even though others have better theoretical performance.
  • BoyerMooreMatch - search using the Boyer-Moore algorithm.
  • TurboBoyerMooreMatch - search using the Turbo Boyer-Moore algorithm.
  • ApostolicoGiancarloMatch - search using the Apostolico-Giancarlo algorithm. Although this algorithm has the best theoretical performance (when considering only character comparisons, as is traditional when evaluating matching algorithms), it is often the slowest in practice due to the complexity of the code.

The demo program

The supplied demo program is a simple console application that searches a specified file for all occurrences of a string, reporting the time taken by each algorithm. Running the demo application on my machine (3Ghz P4) searching a large text file for a frequently occurring string, yielded the following times:

Finding all occurrences of pattern 
         in path (7324800 characters in length)
String.indexOf found 87473 matches in 0.0579839 seconds
Horspool found 21832 matches in 0.0180492 seconds
Boyer-Moore found 87473 matches in 0.0196312 seconds
Turbo Boyer-Moore found 87473 matches in 0.0282827 seconds
Apostolico-GiancarloMatch found 87473 matches in 0.0563225 seconds

I my experience, the ratios shown here are typical, but your mileage will vary depending on what you're looking for and how many partial matches there are in the searched text.


  • 17-Jan-2006 - Initial version.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Carl Daniel
United States United States
No Biography provided

You may also be interested in...

Comments and Discussions

SuggestionChange "chunkIndex" datatype to long. Pin
2xman4-Mar-17 17:48
member2xman4-Mar-17 17:48 
GeneralMy vote of 5 Pin
Armando de la Torre4-Sep-11 10:59
memberArmando de la Torre4-Sep-11 10:59 
GeneralNot able to cope with Unicode? Pin
rivermsfly11-Jun-11 23:37
memberrivermsfly11-Jun-11 23:37 
GeneralRe: Not able to cope with Unicode? Pin
Carl Daniel12-Jun-11 5:53
memberCarl Daniel12-Jun-11 5:53 
GeneralWarning about the case-sensitiveness Pin
Fernando Nájera3-Dec-10 1:08
memberFernando Nájera3-Dec-10 1:08 
GeneralRe: Warning about the case-sensitiveness Pin
Carl Daniel3-Dec-10 5:48
memberCarl Daniel3-Dec-10 5:48 
GeneralActual Perf Data (and Horspool bug) Pin
mpgflx22-Jun-10 7:27
membermpgflx22-Jun-10 7:27 
I generated ~1MB of lorem ipsum text ( and searched for the string "ipsum". I modified the timing code to run each alg 5 times, take the average time, and present the results as a ratio relative to IndexOf. I'm using .NET 4 (VS10), built in Release mode.

The data still fluctuates quite a bit, but is accurate to one significant figure:

IndexOf: 1.0
Horspool: 0.5 *
BoyerMoore: 0.3
TurboBoyerMoore: 0.5
ApostolicoGiancarlo: 0.7

* NOTE: The Horspool algorithm does not find the same number of matches as the other algorithms -- and, indeed, in Carl's posted results in the article it appears Horspool's match count differed as well, so I'd not count that timing result as correct.

Nonetheless, thanks for coding this up!

GeneralRe: Actual Perf Data (and Horspool bug) Pin
Fernando Nájera3-Dec-10 0:57
memberFernando Nájera3-Dec-10 0:57 
QuestionAbout the License of BoyerMooreSearch? Pin
neesha_anj17-Jun-10 12:16
memberneesha_anj17-Jun-10 12:16 
AnswerRe: About the License of BoyerMooreSearch? Pin
Carl Daniel17-Jun-10 12:31
memberCarl Daniel17-Jun-10 12:31 
GeneralRe: About the License of BoyerMooreSearch? Pin
neesha_anj17-Jun-10 12:46
memberneesha_anj17-Jun-10 12:46 
QuestionIndexOf performance improvement? Pin
summersgill30-Nov-09 5:00
membersummersgill30-Nov-09 5:00 
AnswerRe: IndexOf performance improvement? Pin
Carl Daniel17-Jun-10 12:36
memberCarl Daniel17-Jun-10 12:36 
GeneralThis code doesn't work at all Pin
egr26-Mar-09 9:50
memberegr26-Mar-09 9:50 
GeneralRe: This code doesn't work at all Pin
Carl Daniel17-Jun-10 12:34
memberCarl Daniel17-Jun-10 12:34 
GeneralBug in horspool Pin
Messenjah8-Sep-07 4:07
memberMessenjah8-Sep-07 4:07 
GeneralRe: Bug in horspool Pin
Fernando Nájera3-Dec-10 0:57
memberFernando Nájera3-Dec-10 0:57 
QuestionBug: Non - ASCII UNICODE character cause misbehave Pin
cemmerven17-Apr-07 12:30
membercemmerven17-Apr-07 12:30 
AnswerRe: Bug: Non - ASCII UNICODE character cause misbehave Pin
Carl Daniel17-Jun-10 12:33
memberCarl Daniel17-Jun-10 12:33 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.171207.1 | Last Updated 20 Jan 2006
Article Copyright 2006 by Carl Daniel
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid