Article

Wildcard string compare (globbing)

Jack Handy

Rate me:

4.90/5 (82 votes)

15 Feb 2005

1.2M

144

Matches a string against a wildcard string such as "*.*" or "bl?h.*" etc. This is good for file globbing or to match hostmasks.

Usage:

This is a fast, lightweight, and simple pattern matching function.

if (wildcmp("bl?h.*", "blah.jpg")) {
  //we have a match!
} else {
  //no match =(
}

Function:

int wildcmp(const char *wild, const char *string) {
  // Written by Jack Handy - <A href="mailto:jakkhandy@hotmail.com">jakkhandy@hotmail.com</A>
  const char *cp = NULL, *mp = NULL;

  while ((*string) && (*wild != '*')) {
    if ((*wild != *string) && (*wild != '?')) {
      return 0;
    }
    wild++;
    string++;
  }

  while (*string) {
    if (*wild == '*') {
      if (!*++wild) {
        return 1;
      }
      mp = wild;
      cp = string+1;
    } else if ((*wild == *string) || (*wild == '?')) {
      wild++;
      string++;
    } else {
      wild = mp;
      string = cp++;
    }
  }

  while (*wild == '*') {
    wild++;
  }
  return !*wild;
}

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Jack Handy

Web Developer

United States

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

Fastest wildcard function benchmarked with 3 compilers

Sanmayce28-Nov-22 14:55

Sanmayce

28-Nov-22 14:55

Hi guys,
years fly, yet, the need for speed remains, you are welcome to see the latest in WILDCARDS:
Dirwalker - Simplistic and Ergonomic Directory Browser[^]

Message Closed

17-Jun-21 14:45

Dayo Thomas

17-Jun-21 14:45

Message Closed

Unit tests please

richarno2-Dec-19 22:03

richarno

2-Dec-19 22:03

Thanks for this nice piece of code. May be useful in some constrained environments.
I voted a 5-star, although I would appreciate having some unit tests, in particular on what happens on corner cases (for example does "a*?c" match "abc"?)

Thanks.

nice algorithm, but with a weakness

senzabandiera7-Oct-16 23:24

senzabandiera

7-Oct-16 23:24

I like the simplicity of this algorithm, it solves in few lines the string matching with wildcard.
I noticed a weakness when wild is something like *s and I have a long string (N characters): it takes N iterations to find that the string ends with an s.
Instead in case of wild *s it would be faster to check that the string ends with the character s

delphi port

vovach77723-Dec-14 3:38

vovach777

23-Dec-14 3:38

unit glob_;

interface

function WildComp(const wild, s: String): boolean;

implementation

function WildComp(const wild, s: String): boolean;
var
  i_w,i_s : integer;
  l_w,l_s : integer;
  mp_i : integer;
  cp_i : integer;
begin
  i_w := 1;
  i_s := 1;
  l_w := Length(wild);
  l_s := Length(s);
  mp_i := MAXINT;
  cp_i := MAXINT;

  while (i_s <= l_s) and (i_w <= l_w) and (wild[i_w] <> '*') do
  begin
     if (wild[i_w] <> s[i_s]) and (wild[i_w] <> '?') then
         exit(false);
     inc(i_w);
     inc(i_s);
  end;

  while i_s <= L_s do
  begin
   if (i_w <= L_w) and (wild[i_w] = '*')  then
   begin
      inc(i_w);
      if i_w > L_w then
         exit(true);
      mp_i := i_w;
      cp_i := i_s+1;
   end
   else
   if (i_w <= L_w) and (wild[i_w] = s[i_s]) or (wild[i_w]='?') then
   begin
      inc(i_w);
      inc(i_s);
   end
   else
   begin
      i_w := mp_i;
      i_s := cp_i;
      inc(cp_i);
   end;
  end;

  while (i_w <= L_w) and (wild[i_w] = '*') do
    inc(i_w);

  exit(i_w > L_w);
end;


end.

modified 24-Dec-14 1:07am.

My vote of 5+

Sanmayce29-Nov-13 7:47

Sanmayce

29-Nov-13 7:47

Hi Mr. Handy,
it is so good to see etude developers in C, don't know how but I haven't seen your function until three-four days ago.

I postponed all my activities in attempt to come up with some gem.
Last night you kicked my ass, my amateurish interests in wildcard matching led me to writing my own (in fact a semi-port of Igor Pavlov's code) function.
I did my next-to-better to make it superfast, in which I succeeded, but failed to outperform yours, your function is faster than mine both for short and long strings. BRAVO!

I already have had a wildcard searcher working just fine, a very versatile one, but slow. Therefore I added a FAST add-on to my 3-in-1 searcher Kazahana thus allowing VERSATILE (9 wildcards) and FAST (the classic 2 wildcards) modes. I also tested mine vs yours using 2 threads, in short they are really fast, the 2 threads utilize 180-192% the CPU achieving 140-170MB/s TOTAL traversal speed, see further below.

Having failed to "kick your ass" I bend a knee before you, but only temporarily, I need more time to clear my sight, in the meantime it would be nice some real programmer(s) to help me to speed up my etude.
If you can speed up my function, please do so, I will appreciate your widemindedness. Dethroning your own with your own is a sweety feeling.

Since I am fond of benchmarking and endless results logs, you are welcome to my Kazahana dedicated article to see them.

Being an UFC fan I see my defeat in the light of Johny Hendricks defeat by the champion Georges St-Pierre some weeks ago. Johny rocks, I like his style, his interviews are worth seeing as:
Johny Hendricks: "I Am the Champion" (UFC 167 Post-Press Conference)

Best,
Georgi 'Kaze'

P.S.
I couldn't help it, just some of them:

The big benchmark, searching all lines in Wikipedia 1024MB dump:
My function is used in executable: Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe
Your function is used in executable: Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe

The runs are, my wildcards '&'/'+' stand for '*'/'?':
Speed results for pattern "&karolina&wydra&":

D:\_KAZE>timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe "&karolina&wydra&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,160,591 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 156 KB/clock
Kazahana: Performance: 1,401 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 6,694/654
Kazahana: Performance: I/O time, i.e. fread() time, is 9 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,334,917,342 ticks
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.717 =    9%
User Time    =    13.041 =  178%
Process Time =    13.759 =  188%
Global Time  =     7.298 =  100%

D:\_KAZE>timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe "&karolina&wydra&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,167,227 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 163 KB/clock
Kazahana: Performance: 1,459 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 6,428/639
Kazahana: Performance: I/O time, i.e. fread() time, is 9 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,308,754,183 ticks
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.748 =   11%
User Time    =    12.230 =  181%
Process Time =    12.979 =  192%
Global Time  =     6.729 =  100%

Get down get down get down get it on show love and give it up
What are you waiting on?

My vote of 5

Franc Morales29-May-13 15:47

Franc Morales

29-May-13 15:47

Thanks for sharing, my friend.

help required for wilcard matching * and #

SaimaAsif23-Feb-12 23:56

SaimaAsif

23-Feb-12 23:56

I read this article " ,its really amazing. I appreciate your efforts. I am student, I need help in defining the same kind of function according to my requirements. I hope, I 'll get good response.

Words are strings which are separated by dots. Two additional characters are also valid i.e:The *, which matches 1 word and the #, which matches 0..N words Example: *.stock.# matches the routing keys usd.stock and eur.stock.dsf but not stock.nasdaq.

Your help would be highly appreciated.

Sam

My vote of 5

Plamen Petrov13-Dec-11 21:37

Plamen Petrov

13-Dec-11 21:37

A very useful function!

Modification with '#' as wildcard joker for digits

Thomas Haase25-Sep-11 23:16

Thomas Haase

25-Sep-11 23:16

First of all I like this code, it is small and fully stand-alone.
I have modified it, because I need an additional wildcard joker that represents digits. Finally the modified function accepts '*', '?' and '#' as joker characters.

int wildcmp_ex(const char *wild, const char *string) {
  const char *cp = NULL, *mp = NULL;

  while (*string) {
    if (*wild == '*') {
      if (!*++wild) {
        return 1;
      }
      mp = wild;
      cp = string+1;
    } else if (((*wild == *string) && (*wild != '#')) || (*wild == '?') || ((*wild == '#') && isdigit(*string))) {
      wild++;
      string++;
    } else {
      if (mp)
      {
        wild = mp;
        string = cp++;
      }
      else
      {
        return 0;
      }
    }
  }

  while (*wild == '*') {
    wild++;
  }
  return !*wild;
}

Thomas Haase

modified 29-Sep-11 8:26am.

Licence Question

randommark23-Nov-10 0:33

randommark

23-Nov-10 0:33

Hi Jack Handy,

Is there a licence attached to this code?

Thanks, Mark

Another C# version, with a twist

Thomas Levesque29-Jun-10 14:50

Thomas Levesque

29-Jun-10 14:50

Just for fun... a C# version with almost the same syntax as the original C version Smile | :)

public static bool wildcmp(string pattern, string text) {

  var wild = new StringScanner(pattern);
  var @string = new StringScanner(text);

  var mp = wild;
  var cp = @string;

  while (@string && wild != '*') {
    if (wild != @string && wild != '?') {
      return false;
    }
    wild++;
    @string++;
  }

  while (@string) {
    if (@wild == '*') {
      if (!++wild) {
        return true;
      }
      mp = wild;
      cp = @string + 1;
    } else if (wild == @string || wild == '?') {
      wild++;
      @string++;
    } else {
      wild = mp;
      @string = cp++;
    }
  }

  while (wild == '*') {
    wild++;
  }
  return !wild;
}

public struct StringScanner
{
    private string _string;
    private int _position;
    
    public StringScanner(string s)
    {
        _string = s;
        _position = 0;
    }
    
    public string String
    {
        get { return _string; }
    }
    
    public int Position
    {
        get { return _position; }
    }
        
    public bool Finished
    {
        get { return _position == _string.Length;}
    }
    
    public char Current
    {
        get { return Finished ? '\0' : _string[_position]; }
    }
    
    public bool MoveNext()
    {
        if (Finished)
            return false;
        _position++;
        return true;
    }
    
    public static StringScanner operator ++(StringScanner scanner)
    {
        scanner.MoveNext();
        return scanner;
    }
    
    public static StringScanner operator +(StringScanner scanner, int n)
    {
        return new StringScanner(scanner.String)
        {
            _position = Math.Min(scanner.Position + n, scanner.String.Length)
        };
    }
    
    public static implicit operator bool(StringScanner scanner)
    {
        return !scanner.Finished;
    }
    
    public static implicit operator char(StringScanner scanner)
    {
        return scanner.Current;
    }
    
    public static bool operator ==(StringScanner scanner1, StringScanner scanner2)
    {
        return scanner1.Current == scanner2.Current;
    }
    
    public static bool operator !=(StringScanner scanner1, StringScanner scanner2)
    {
        return scanner1.Current != scanner2.Current;
    }
}

My blog : in English - in French

Obscurity

Chuck O'Toole25-Apr-10 18:18

Chuck O'Toole

25-Apr-10 18:18

I've been using this for years, just don't show it to your instructor.

// String match with wildcards.  Obtained from the Internet somewhere.  Case insensitive.

BOOL wm(const char *s, const char *t)
{
	return *t-'*' ? *s ? (*t=='?') | (toupper(*s)==toupper(*t)) && wm(s+1,t+1) : !*t : wm(s,t+1) || *s && wm(s+1,t);
}

If you want case sensitive, remove the toupper() calls.

My C# contribution - recursive, of course!

RenniePet26-Mar-10 5:21

RenniePet

26-Mar-10 5:21

This strikes me as an obvious place to use recursion. So here goes...

public class MString
{
   /// <summary>
   /// Function to compare two strings, where strA may contain wildcard characters '*' and
   /// '?'. http://en.wikipedia.org/wiki/Wildcard_character
   /// </summary>
   /// <param name="strA">string which may contain wildcards, may be empty, must not be null</param>
   /// <param name="strB">string to compare to, no wildcard processing, may be empty, must not be null</param>
   /// <param name="ignoreCase">true = ignore upper/lower case, false = don't ignore case</param>
   /// <returns>true = match, false = non-match</returns>
   public static bool CompareWWc(string strA, string strB, bool ignoreCase)
   {
      if (ignoreCase)
         return CompareWWc(strA.ToLower(), strB.ToLower());
      else
         return CompareWWc(strA, strB);
   }


   /// <summary>
   /// Recursive function to compare two strings, where strA may contain wildcard characters
   /// '*' and '?'. http://en.wikipedia.org/wiki/Wildcard_character
   /// </summary>
   /// <param name="strA">string which may contain wildcards, may be empty, must not be null</param>
   /// <param name="strB">string to compare to, no wildcard processing, may be empty, must not be null</param>
   /// <returns>true = match, false = non-match</returns>
   public static bool CompareWWc(string strA, string strB)
   {
      // Top of loop to scan across strA (and strB)
      for (int i = 0; i < strA.Length; i++)
      {
         // Special processing when we hit a '*' in strA
         if (strA[i] == '*')
         {
            // If the '*' is at the end of strA then result = true irrespective of strB
            if (i == strA.Length - 1)
               return true;

            // Do recursive calls to try to find a match somewhere to the right in strB
            strA = strA.Substring(i + 1);  // The part of strA beyond the '*'
            for (int j = i; j < strB.Length; j++)
               if (CompareWWc(strA, strB.Substring(j)))
                  return true;
            return false;
         }

         // Normal processing for non-'*' characters in strA
         if (i >= strB.Length || (strA[i] != strB[i] && strA[i] != '?'))
            return false;
      }

      // We've reached the end of strA and the last character is not '*'
      return strA.Length == strB.Length;
   }

}

And here's a little test sequence:

   if (!MString.CompareWWc("", ""))
      Console.WriteLine("Something wrong!");


   if (!MString.CompareWWc("something", "something"))
      Console.WriteLine("Something wrong!");

   if (MString.CompareWWc("something", "zomething"))
      Console.WriteLine("Something wrong!");

   if (MString.CompareWWc("something", "some"))
      Console.WriteLine("Something wrong!");

   if (MString.CompareWWc("something", "something else"))
      Console.WriteLine("Something wrong!");


   if (!MString.CompareWWc("s?m?th???", "something"))
      Console.WriteLine("Something wrong!");

   if (MString.CompareWWc("s?m?th???", "somethin"))
      Console.WriteLine("Something wrong!");


   if (!MString.CompareWWc("*", ""))
      Console.WriteLine("Something wrong!");

   if (!MString.CompareWWc("*", "nonsense"))
      Console.WriteLine("Something wrong!");

   if (!MString.CompareWWc("non*", "nonsense"))
      Console.WriteLine("Something wrong!");


   if (!MString.CompareWWc("*nonsense", "nonsense"))
      Console.WriteLine("Something wrong!");

   if (!MString.CompareWWc("non*nse", "nonsense"))
      Console.WriteLine("Something wrong!");

   if (MString.CompareWWc("non*nse", "nonsenze"))
      Console.WriteLine("Something wrong!");

   if (!MString.CompareWWc("non*n?e", "nonsense"))
      Console.WriteLine("Something wrong!");


   if (!MString.CompareWWc("n*on*nse", "nonsense"))
      Console.WriteLine("Something wrong!");

   if (!MString.CompareWWc("n*n*nse", "nonsense"))
      Console.WriteLine("Something wrong!");

   if (MString.CompareWWc("*non*nse", "nonsenze"))
      Console.WriteLine("Something wrong!");

   if (!MString.CompareWWc("n*n*n?e", "nonsense"))
      Console.WriteLine("Something wrong!");
}

By the way, the name CompareWWc means Compare With Wildcards.

Re: My C# contribution - recursive, of course!

Erwin de GRoot29-Mar-10 1:58

Erwin de GRoot

29-Mar-10 1:58

Actually, the recursive function together with substring will make this slow.
I'm using this at the moment:

public static class StringExtensions
{
    public static bool WildcardMatch(this string str, string compare, bool ignoreCase)
    {
        if (ignoreCase)
            return str.ToLower().WildcardMatch(compare.ToLower());
        else
            return str.WildcardMatch(compare);
    }

    public static bool WildcardMatch(this string str, string compare)
    {
        if (string.IsNullOrEmpty(compare))
            return str.Length == 0;
        int pS = 0;
        int pW = 0;
        int lS = str.Length;
        int lW = compare.Length;

        while (pS < lS && pW < lW && compare[pW] != '*')
        {
            char wild = compare[pW];
            if (wild != '?' && wild != str[pS])
                return false;
            pW++;
            pS++;
        }

        int pSm = 0;
        int pWm = 0;
        while (pS < lS && pW < lW)
        {
            char wild = compare[pW];
            if (wild == '*')
            {
                pW++;
                if (pW == lW)
                    return true;
                pWm = pW;
                pSm = pS + 1;
            }
            else if (wild == '?' || wild == str[pS])
            {
                pW++;
                pS++;
            }
            else
            {
                pW = pWm;
                pS = pSm;
                pSm++;
            }
        }
        while (pW < lW && compare[pW] == '*')
            pW++;
        return pW == lW && pS == lS;
    }
}

Depends on whether you need to optimize the last few nanoseconds out of it...

RenniePet29-Mar-10 7:45

RenniePet

29-Mar-10 7:45

Hi Erwin,

Thanks for your posting. It did make me decide to investigate the situation.

I still really think this is a situation that begs for recursion. But maybe you were right that substring is not a good idea. So I made this version:

public class MString2
{
   /// <summary>
   /// Function to compare two strings, where strA may contain wildcard characters '*' and
   /// '?'. http://en.wikipedia.org/wiki/Wildcard_character
   /// </summary>
   /// <param name="strA">string which may contain wildcards, may be empty, must not be null</param>
   /// <param name="strB">string to compare to, no wildcard processing, may be empty, must not be null</param>
   /// <param name="ignoreCase">true = ignore upper/lower case, false = don't ignore case</param>
   /// <returns>true = match, false = non-match</returns>
   public static bool CompareWWc(string strA, string strB, bool ignoreCase)
   {
      if (ignoreCase)
         return CompareWWc(strA.ToLower(), 0, strB.ToLower(), 0);
      else
         return CompareWWc(strA, 0, strB, 0);
   }


   /// <summary>
   /// Function to compare two strings, where strA may contain wildcard characters '*' and
   /// '?'. http://en.wikipedia.org/wiki/Wildcard_character
   /// </summary>
   /// <param name="strA">string which may contain wildcards, may be empty, must not be null</param>
   /// <param name="strB">string to compare to, no wildcard processing, may be empty, must not be null</param>
   /// <returns>true = match, false = non-match</returns>
   public static bool CompareWWc(string strA, string strB)
   {
      // Just call the private recursive version of this function
      return CompareWWc(strA, 0, strB, 0);
   }


   /// <summary>
   /// Private recursive function used by the above two public functions.
   /// </summary>
   /// <param name="strA">string which may contain wildcards, may be empty, must not be null</param>
   /// <param name="indexA">index into strA marking start of the string for processing purposes</param>
   /// <param name="strB">string to compare to, no wildcard processing, may be empty, must not be null</param>
   /// <param name="indexB">index into strB marking start of the string for processing purposes</param>
   /// <returns>true = match, false = non-match</returns>
   private static bool CompareWWc(string strA, int indexA, string strB, int indexB)
   {
      // Top of loop to scan across strA (and strB)
      for (int i = 0; indexA + i < strA.Length; i++)
      {
         // Special processing when we hit a '*' in strA
         if (strA[indexA + i] == '*')
         {
            // If the '*' is at the end of strA then result = true irrespective of strB
            if (indexA + i == strA.Length - 1)
               return true;

            // Do recursive calls to try to find a match somewhere to the right in strB
            for (int j = indexB + i; j < strB.Length; j++)
               if (CompareWWc(strA, indexA + i + 1, strB, j))
                  return true;
            return false;
         }

         // Normal processing for non-'*' characters in strA
         if (indexB + i >= strB.Length || (strA[indexA + i] != strB[indexB + i] && strA[indexA + i] != '?'))
            return false;
      }

      // We've reached the end of strA and there is no '*' in strA
      return strA.Length - indexA == strB.Length - indexB;
   }

}

Then I ran some timing tests, using System.Diagnostics.Stopwatch. I put my test case with 19 calls to the function in a loop and executed it 10,000 times. I did this for my original version, your version, and my new version. I compiled the programs in Release mode.

Assuming I haven't made a mistake somewhere, here are my results for a single function call:

My original version:  342 nonoseconds
Your version:         237 nanoseconds
My second version:    279 nanoseconds

Now to tell you the truth, I find it very difficult to get excited about saving 100 nanoseconds at the expense of having two and a half times as many lines of code. Especially since my expected use of this function in my application will probably never exceed a couple hundred calls per day. Smile | :)

Anyway, thanks for getting me to think things over again and make the tests. Personally, at least in this particular case, I prefer programmer understandability to execution efficiency. I've decided to stick with my original version, since I think my second version is more difficult to understand, and the improved efficiency not worth that disadvantage.

Sorry - revised numbers

RenniePet29-Mar-10 8:35

RenniePet

29-Mar-10 8:35

Hi Erwin,

Sorry - my previous numbers are not correct. I was running the programs under the Visual Studio debugger, and that was apparently not good for timing tests.

Here's what I get now:

My original version:  243 nonoseconds
Your version:          76 nanoseconds
My second version:    111 nanoseconds

Assuming these timings are valid, your version is three times faster than my original version, and that is pretty significant, at least in a situation were the function may be used millions times a day.

Sorry for the incorrect timings in my previous posting.

Re: Depends on whether you need to optimize the last few nanoseconds out of it...

Erwin de GRoot29-Mar-10 8:37

Erwin de GRoot

29-Mar-10 8:37

Yes, the recursive function makes it more understandable for sure. In my case I actually call it several thousands of times after certain user actions, so I'm even considering using unsafe code Smile | :)

I also thought of a special case where your function will get a performance hit: SearchString = "--ABC-----ABC-----ABC-----lots of text (without 'at') goes here", wildcardString = "*ABC*@". In this case my function (based on Jack's) will search for the '@' character once starting from position 5 (but won't find it, because it's not there). With your function it would search for the '@' character 3 times (once starting from position 5 until the end, once from 13 and once from 21). The longer the text at the end or the more occurances of 'ABC' at the start, the greater the performance hit.

Yet another version - 25% faster, I think [modified]

RenniePet1-Apr-10 8:24

RenniePet

1-Apr-10 8:24

If at first you don't succeed...

Here's my third version, where I say to hell with minimizing lines of code and try to optimize the speed. No "unsafe" code though, unless you consider "goto" to be unsafe coding. Smile | :)

public class MString
{
   /// <summary>
   /// Compare two strings, where strA may contain wildcard characters '*' and '?'.
   /// </summary>
   /// <param name="strA">string which may contain wildcards, may be empty,
   ///                    must not be null</param>
   /// <param name="strB">string to compare to, no wildcard processing, may be empty,
   ///                    must not be null</param>
   /// <param name="ignoreCase">true = ignore upper/lower case, false = observe case</param>
   /// <returns>true = match, false = non-match</returns>
   public static bool CompareWWc(string strA, string strB, bool ignoreCase)
   {
      if (ignoreCase)
         return CompareWWc(strA.ToLower(), strB.ToLower());
      else
         return CompareWWc(strA, strB);
   }


   /// <summary>
   /// Compare two strings, where strA may contain wildcard characters '*' and '?'.
   ///
   /// In the comments, the word 'segment' is used to talk about the portions of strA that
   /// fall between two '*' characters, or between the start of the string and the first '*'
   /// or between the last '*' and the end of the string.
   /// </summary>
   /// <param name="strA">string which may contain wildcards, may be empty,
   ///                    must not be null</param>
   /// <param name="strB">string to compare to, no wildcard processing, may be empty,
   ///                    must not be null</param>
   /// <returns>true = match, false = non-match</returns>
   public static bool CompareWWc(string strA, string strB)
   {
      int starPtr = 0;  // Points at the '*' in strA

      // This part of the code handles the first segment in strA, or the case where strA
      //  does not contain any '*' character at all. The first segment is fairly simple to
      //  handle because it must match from the start of strB - no need to have a sliding
      //  match loop.

      // Check strB long enough so we don't need to test for hitting its end while scanning
      if (strB.Length >= strA.Length)
      {
         // Simple optimized scan of first segment of strA and comparison with strB
         for (;; starPtr++)
         {
            if (starPtr == strA.Length)
               return strA.Length == strB.Length;  // No '*' in strA and no mismatch
            if (strA[starPtr] == '*')
               goto firstSegmentMatches;
            if (strA[starPtr] != strB[starPtr] && strA[starPtr] != '?')
               return false;  // Mismatch
         }
      }
      else
      {
         // When strB is shorter than strA a match is not likely. But if strA contains
         //  enough '*' characters it is possible, so we have to give it a try.
         for (;; starPtr++)
         {
            if (strA[starPtr] == '*')
               goto firstSegmentMatches;
            if (starPtr == strB.Length)
               return false;  // No '*' in strA before end of strB encountered
            if (strA[starPtr] != strB[starPtr] && strA[starPtr] != '?')
               return false;  // Mismatch
         }
      }

      // The rest of the code handles the case where strA does contain one or more '*'
      //  characters, and the first segment does match the start of strB.

   firstSegmentMatches:

      int indexA;  // Start of segment in strA
      int indexB = starPtr;  // Sliding match location in strB

      // Loop to process the segments in strA
      while (true)
      {
         // Test if next segment is last and empty
         indexA = ++starPtr;  // Point past '*'
         if (indexA == strA.Length)
            return true;  // Last segment empty - matches irrespective of strB content

         // Scan over the next segment in strA
         for (;; starPtr++)
            if (starPtr == strA.Length || strA[starPtr] == '*')
               break;

         // Try to find match for this segment somewhere in strB
         for (;; indexB++)
         {
            if (starPtr - indexA > strB.Length - indexB)
               return false;  // Mismatch if not enough characters left in strB

            for (int i = indexA, j = indexB; i < starPtr; i++, j++)
               if (strA[i] != strB[j] && strA[i] != '?')
                  goto tryStringBAgain;

            goto findNextSegment;  // Match found for this segment in strB

         tryStringBAgain:
            continue;
         }

         // Was that last segment? Return if so, loop if not.
      findNextSegment:
         indexB += starPtr - indexA;  // Point past matching portion of strB
         if (starPtr == strA.Length)
            return indexB == strB.Length;  // Return if that was last segment
      }
   }

}

And here are my timing results (which I'm not totally sure of, I'm not used to timing code):

My original version:  243 nanoseconds    17 lines of code
Erwin's version:       76 nanoseconds    42 lines of code
My second version:    111 nanoseconds    16 lines of code
My third version:      56 nanoseconds    52 lines of code

I'd appreciate it if someone would check this out and let me know if they find any bugs or anything.

Re: Yet another version - 25% faster, I think

aleks1k21-Sep-11 2:47

aleks1k

21-Sep-11 2:47

I found small bug, if compare "*a" and "babbba" function return false.

I used this function but I how I can catch variables from the * ???

moh.hijjawi20-Oct-09 1:55

moh.hijjawi

20-Oct-09 1:55

Dear Jack,
Dear all,

I used this function in comparing two strings the first is Pattern(* KK *) and the second is Text(TT KK ZZ) and the function return pass. thats briliant,but my question how I can edit the function to be able to catch or handle the characters of matched * to save them in variables. for example:

X = TT
Y = ZZ

to deal with them later on in my system.

I tried many times but its not working well so far.

So please any one have an idea to do that please let me know and its will be appreciated.

Best Regards.

Re: I used this function but I how I can catch variables from the * ???

RenniePet1-Apr-10 11:27

RenniePet

1-Apr-10 11:27

It would be easiest if you use regular expressions instead of this function.
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchcollection.aspx[^]

any updates ?

kiquenet.com2-Jul-09 5:12

kiquenet.com

2-Jul-09 5:12

code in C# ??

Improved matching with end-of-text

Anders Heie11-May-09 15:20

Anders Heie

11-May-09 15:20

Great code, but when trying this I realized that the following pattern is a match:

Search: ????????
Text to search: ABC

The problem is that the pattern can be LONGER than the text searched, in which case it should return a not found, but instead returns found.

Also, this example succeeds:

Search: y*n
Text to search: yessir

But of course should fail, since I'm looking for a text that ends with n

So I re-wrote your program to this, to correctly handle this situation.

bool StrWildCmp(char* wildstring, char *matchstring){

	
	char stopstring[1];
	*stopstring = 0;

	while(*matchstring) {
		if (*wildstring == '*') {
		  if (!*++wildstring) {
			return true;
		  } else {
			  *stopstring = *wildstring;
		  }
		}

		if(*stopstring) {
			if(*stopstring == *matchstring ) {
				wildstring++;
				matchstring++;
				*stopstring = 0;
			} else {
				matchstring++;
			}
		} else if((*wildstring == *matchstring) || (*wildstring == '?')) {
				wildstring++;
				matchstring++;
		} else {
			return false;
		}

		if(!*matchstring && *wildstring && *wildstring != '*') {
			// matchstring too short
			return false;
		}
	}

  return true;
}

Thanks again for the inspiration. Cool | :cool:

Re: Improved matching with end-of-text: some cases don't work properly!

roadrunner31412-Aug-09 3:35

roadrunner314

12-Aug-09 3:35

some cases don't work properly:

wildstring = "a*bc"
matchstring = "abbc"
should be true, but it returns false

wildstring = "a*b"
matchstring = "a"
should be false, but it returns true

wildstring = "a*?b"
matchstring = "axb"
should be true, but it returns false

wildstring = "a**b"
matchstring = "axb"
should be true, but it returns false (ok, the two ** aren't useful, but they should work)

I solved the last 3 bugs, but the first one is a bit tricky...

bool StrWildCmp(char* wildstring, char *matchstring){
   char stopstring[1];
   *stopstring = '\0';

   while(*matchstring != '\0')
   {
      if (*wildstring == '*') 
      {
         do
         {         
            wildstring++;            
         } while (*wildstring == '*');  // if a dork entered two or more * in a row 
                                        // ignore them and go ahead
         
         if (*wildstring == '\0')   // if * was the last char, the strings are equal
         {
            return TRUE;
         }
         else
         {
            *stopstring = *wildstring; // the next char to check after the *
         }
      }

      if(*stopstring != '\0')
      {
         if((*stopstring == *matchstring) || (*stopstring == '?') ) 
         {
            wildstring++;
            *stopstring = '\0';
         }
         matchstring++;
      }
      else
         if((*wildstring == *matchstring) || (*wildstring == '?'))
         {
            wildstring++;
            matchstring++;
         }
         else
         {
            return FALSE;
         }

      if( (*matchstring == '\0') && (*wildstring != '\0') )
      {
         // matchstring seems to be too short. Check if wildstring has any more chars except '*'
         while (*wildstring == '*') // ignore following '*'
            wildstring++;
         
         if (*wildstring == '\0') // if wildstring endet after '*', strings are equal
            return TRUE;
         else
            return FALSE;
      }
}

General News Suggestion Question Bug Answer Joke Praise Rant Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Go to top