Click here to Skip to main content
12,943,528 members (56,927 online)
Click here to Skip to main content
Add your own
alternative version

Tagged as


3 bookmarked
Posted 17 Feb 2014

Using Morris Pratt String Searching In Byte Arrays

, 17 Feb 2014 CPOL
Rate this:
Please Sign up or sign in to vote.
A short discussion of string searching in raw data.


This is a tip about how I use Morris Pratt string searching algorithm in byte arrays to get the same efficient result. I'm assuming that you've understood the basic concept of this algorithm so it won't be repeated in this topic. Also, you should already know extension methods before you get started with this topic.


In 2013, I built an extension method of Morris Pratt searching algorithm to search specific string in text file. But later, I had an idea and tried to apply it on every enumerable-related type if each element in it is equatable. So I expanded my method and then get the same efficient result if I search it in byte array, which means I can directly search string in raw text file and don't have to use ASCII or Unicode to decode it at first. The program became more efficient in text searching.

Using the Code

It's easy to use these extensions. First of all, let's take a look of their prototypes:

public static int MorrisPrattSearchFirst<T>(this T[] t, T[] p)
    where T : IEquatable<T> { }

public static int MorrisPrattSearchFirst<T>(this T[] t, T[] p, int startIndex)
    where T : IEquatable<T> { }

where t is the parent array and p is the specified array for searching. The second method allows users to search p in t starting at non-zero index. Constraint on type parameter T makes sure that each element in both arrays is equatable. Returned value represents the index of first matched result (-1 if not found). Both of them are defined under the CollectionAssistant.IEnumerableExtentions class.

Here's a complete example of using them. First, we search key word "Ishmael" in character arrays from the text file MobyDick.txt. And next search it in ASCII-encoded byte arrays. Then compare the efficiency.

using System;
using System.Diagnostics;
using System.IO;
using System.Text;

namespace CollectionAssistant.Demo
    class Program
        static void Main(string[] args)
            // This is a example which demonstrates the differences 
            // between searching text in characters and in raw byte-array.
            string keyWord = "Ishmael";

            // Searching key word in character array. 
            char[] originalCharArray = File.ReadAllText("MobyDick.txt").ToCharArray();
            char[] keyWordCharArray = keyWord.ToCharArray();

            int index = -1;
            int pos = 0;

            Console.WriteLine("Start searching key word in character array.");

            Stopwatch watch = Stopwatch.StartNew();

                // Using the extension method.
                index = originalCharArray.MorrisPrattSearchFirst(keyWordCharArray, pos);

                if (index >= 0)
                    Console.WriteLine("Key word \"{0}\" found. Index: {1}", keyWord, index);
                    pos += (index + keyWordCharArray.Length);
            } while (index >= 0 && pos < originalCharArray.Length);

            Console.WriteLine("Elapsed ticks: {0}", watch.ElapsedTicks); 

            pos = 0; 
            Console.WriteLine("Start searching key word in byte array.");

            // Searching key word in byte array.
            byte[] originalByteArray = File.ReadAllBytes("MobyDick.txt");
            byte[] keyWordByteArray = Encoding.Default.GetBytes(keyWord);


                // Using the extension method.
                index = originalByteArray.MorrisPrattSearchFirst(keyWordByteArray, pos);

                if (index >= 0)
                    Console.WriteLine("Key word \"{0}\" found. Index: {1}", keyWord, index);
                    pos += (index + keyWordByteArray.Length);
            } while (index >= 0 && pos < originalByteArray.Length);


            Console.WriteLine("Elapsed ticks: {0}", watch.ElapsedTicks); 

The result shows that both of them performed almost same efficiently: about 3000 ~ 8000 ticks in this case.

Test result.


Searching text in character arrays and in byte arrays are both efficient while using Morris Pratt algorithm. But sometimes, it's better in byte arrays especially for some situations such as text file or multipart/form-data - which means parent byte array possibly contains both text and non-text data and can't be decoded into character array or string.

You can obtain the complete source code from here.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Robert Vandenberg Huang
Software Developer
Taiwan Taiwan
Software developer, video game enthusiast, drummer and also a huge Jazz music fan. Have been working on software development since junior high school. Love sharing knowledge with people, learning new things and having interaction with developer community.

My Homepage at GitHub

You may also be interested in...


Comments and Discussions

-- There are no messages in this forum --
Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170518.1 | Last Updated 18 Feb 2014
Article Copyright 2014 by Robert Vandenberg Huang
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid