Click here to Skip to main content
15,887,135 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have implemented the BoyerMooore class from here https://stackoverflow.com/questions/37500629/find-byte-sequence-within-a-byte-array

but it does not give the results expected
here the code implemented:

What I have tried:

Update: one flaw it reads the entire buffer, no matter who much was read with gave to much result, now i get correct 120 on pattern "00" but still only 5 on the 16 bytes pattern which should be 10

Update2: but change a line i get correct for the 16 bytes pattern, and all for 1 - to 16 except the 15 bytes, which I only get 5 from, but should be 10 also.

using System;
using System.Collections.Generic;

namespace Test
{
    /// <summary>
    /// Description of Boyer.
    /// </summary>

    public sealed class BoyerMoore
    {
    readonly byte[] needle;
    readonly int[] charTable;
    readonly int[] offsetTable;

    public BoyerMoore(byte[] needle)
    {
        this.needle = needle;
        this.charTable = makeByteTable(needle);
        this.offsetTable = makeOffsetTable(needle);
    }

    public IEnumerable<int> Search(byte[] haystack,bytesRead) //send with how much was read
    {
        if (needle.Length == 0)
            yield break;

        for (int i = needle.Length - 1; i < bytesRead;) //changed here
        {
            int j;

            for (j = needle.Length - 1; needle[j] == haystack[i]; --i, --j)
            {
                if (j != 0)
                    continue;

                yield return i;
                i += needle.Length - 1;
                break;
            }

            i += Math.Max(offsetTable[needle.Length - 1 - j], charTable[haystack[i]]);
        }
    }

    static int[] makeByteTable(byte[] needle)
    {
        const int ALPHABET_SIZE = 256;
        int[] table = new int[ALPHABET_SIZE];

        for (int i = 0; i < table.Length; ++i)
            table[i] = needle.Length;

        for (int i = 0; i < needle.Length - 1; ++i)
            table[needle[i]] = needle.Length - 1 - i;

        return table;
    }

    static int[] makeOffsetTable(byte[] needle)
    {
        int[] table = new int[needle.Length];
        int lastPrefixPosition = needle.Length;

        for (int i = needle.Length - 1; i >= 0; --i)
        {
            if (isPrefix(needle, i + 1))
                lastPrefixPosition = i + 1;

            table[needle.Length - 1 - i] = lastPrefixPosition - i + needle.Length - 1;
        }

        for (int i = 0; i < needle.Length - 1; ++i)
        {
            int slen = suffixLength(needle, i);
            table[slen] = needle.Length - 1 - i + slen;
        }

        return table;
    }

    static bool isPrefix(byte[] needle, int p)
    {
        for (int i = p, j = 0; i < needle.Length; ++i, ++j)
            if (needle[i] != needle[j])
                return false;

        return true;
    }

    static int suffixLength(byte[] needle, int p)
    {
        int len = 0;

        for (int i = p, j = needle.Length - 2; i >= 0 && needle[i] == needle[j]; --i, --j) //cahnge for -1 to -2
            ++len;

        return len;
    }
}

}

Heres the code where make the pattern, save pattern of different length in a jagged array, no error here, pattern saved as intended.

    for(xtel=(bytesRead-1);xtel>=0;xtel--)
    {
        buffz+=buffer[xtel].ToString();
        if(ytel==0)
        {
            Array.Resize(ref pattern,ytel+1);
            pattern[ytel]=new byte[] {buffer[xtel]};
        }
        else
        {
            Array.Resize(ref pattern,ytel+1);
            pattern[ytel]=pattern[ytel-1];
            Array.Resize(ref pattern[ytel],pattern[ytel].Length+1);
            pattern[ytel][(pattern[ytel].Length-1)]=buffer[xtel];


            if(pattern[ytel].Length==31)
            {
                break;
            }
        }
        ytel++;
    }

Since IEnumerable does not have any count it should be count every time:

heres the code where loop through the jagged array and search

      Array.Resize(ref pregz,pattern.Length);

    for(xtel=0;xtel<pattern.Length;xtel++)
    {
        Array.Reverse(pattern[xtel]);

        var searcher = new BoyerMoore(pattern[xtel]);

        foreach (int index in searcher.Search(buffer))
        {
            pregz[xtel]++;
     }
    }


but my result is far less, than if I count in a hexeditor Checked to see the pattern, and that pattern was correct.

The result i get of 16 bytes patter in the method, is 41, but when check by hex i count 82, so somewhere there is a error.

I have also have tested by making a new file with example:

00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00

then copied 10 times to

00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00 00 00 00 01 00 00 0f a0 00 00 00 03 00 00 00 00

but getting weird results

like 2008 for the pattern 00

with should be only 120

and only 5 for the original pattern which should be 10

but correct for this pattern 10 when it should be 10 03 00 00 00 00
Posted
Updated 11-Apr-18 6:06am
Comments
BillWoodruff 11-Apr-18 14:32pm    
Why not post your results, and question, on the StackOverflow thread ?

1 solution

Here is a couple articles that might be of interest:
Combining Boyer-Moore String Search with Regular Expressions | Dr Dobb's[^]
Faster String Searches | Dr Dobb's[^]
Boyer-Moore and related exact string matching algorithms[^]
Efficient Boyer-Moore Search in Unicode Strings[^]

Advice: In order to help helpers, it is a good idea to post a piece of code that is autonomous (with a main function) and exhibit the problem, so that we can execute your code.
 
Share this answer
 
Comments
BillWoodruff 11-Apr-18 14:31pm    
+5
Patrice T 11-Apr-18 14:35pm    
Thank you.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900