Click here to Skip to main content

Article

Wildcard string compare (globbing)

Jack Handy

Rate me:

4.90/5 (82 votes)

15 Feb 2005

1.3M

96

144

Matches a string against a wildcard string such as "*.*" or "bl?h.*" etc. This is good for file globbing or to match hostmasks.

Usage:

This is a fast, lightweight, and simple pattern matching function.

if (wildcmp("bl?h.*", "blah.jpg")) {
  //we have a match!
} else {
  //no match =(
}

Function:

int wildcmp(const char *wild, const char *string) {
  // Written by Jack Handy - <A href="mailto:jakkhandy@hotmail.com">jakkhandy@hotmail.com</A>
  const char *cp = NULL, *mp = NULL;

  while ((*string) && (*wild != '*')) {
    if ((*wild != *string) && (*wild != '?')) {
      return 0;
    }
    wild++;
    string++;
  }

  while (*string) {
    if (*wild == '*') {
      if (!*++wild) {
        return 1;
      }
      mp = wild;
      cp = string+1;
    } else if ((*wild == *string) || (*wild == '?')) {
      wild++;
      string++;
    } else {
      wild = mp;
      string = cp++;
    }
  }

  while (*wild == '*') {
    wild++;
  }
  return !*wild;
}

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

Written By

Jack Handy

Web Developer

United States

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

Re: Why make 3 loop ?

Jack Handy13-Feb-05 10:02

13-Feb-05 10:02

DarkYoda Mickael wrote:
I don't understand why you make 3 loop to do it ?

I think i don't see all case, because for me only the 2 loop make all the work ?

The third loop:

while (*wild == '*') {<br />
    wild++;<br />
}

is there to take care of trailing *'s. Since * means 0 or more chars, "test*" should match "test" just fine. That loop takes care of this case.

-Jack

There are 10 types of people in this world, those that understand binary and those who don't.

Sancy26-Oct-04 6:23

26-Oct-04 6:23

Hi, i have a stupid question, could someone give me the c# version Smile | :)

Smile | :)

thanks in advance

Psyk6621-Dec-04 3:39

21-Dec-04 3:39

private bool wildcmp(string wild, string str)
{
    int cp=0, mp=0;

    int i=0;
    int j=0;
    while ((i<str.Length) && (wild[j] != '*'))
    {
        if ((wild[j] != str[i]) && (wild[j] != '?'))
        {
            return false;
        }
        i++;
        j++;
    }

    while (i<str.Length)
    {
        if (j<wild.Length && wild[j] == '*')
        {
            if ((j++)>=wild.Length)
            {
                return true;
            }
            mp = j;
            cp = i+1;
        }
        else if (j<wild.Length && (wild[j] == str[i] || wild[j] == '?'))
        {
            j++;
            i++;
        }
        else
        {
            j = mp;
            i = cp++;
        }
    }

    while (j<wild.Length && wild[j] == '*')
    {
        j++;
    }
    return j>=wild.Length;
}

This C# version works. I'm sure there are loads of improvements to be made though. Don't flame me for such bad code, I only started C# yesterday;)

Ionut FIlip22-Feb-05 6:15

22-Feb-05 6:15

A small fix:
while ((i<str.Length) && (wild[j] != '*'))
should be
while (i < str.Length && j < wild.Length && wild[j] != '*')

And a small improvement for case sensitivity:
private bool wildcmp(string wild, string str, bool case_sensitive)
{
if (! case_sensitive)
{
wild = wild.ToLower();
str = str.ToLower();
}

// rest of the code is the same
}

Ionut Filip

robagar3-Apr-06 16:58

3-Apr-06 16:58

hiya

Just thought I'd share my version of this code

- put the whole shebang into a class with public static methods
- fixed a bug where the pattern '?' matches all strings
- added an early-exit test for patterns that don't actually contain wildcards so it just defaults to normal string comparison

cheers
Rob

/// <summary>
/// Class providing wildcard string matching.
/// </summary>
public class Wildcard
{
private Wildcard()
{
}

/// <summary>
/// Array of valid wildcards
/// </summary>
private static char[] Wildcards = new char[]{'*', '?'};

/// <summary>
/// Returns true if the string matches the pattern which may contain * and ? wildcards.
/// Matching is done without regard to case.
/// </summary>
/// <param name="pattern"></param>
/// <param name="s"></param>
/// <returns></returns>
public static bool Match(string pattern, string s)
{
return Match(pattern, s, false);
}

/// <summary>
/// Returns true if the string matches the pattern which may contain * and ? wildcards.
/// </summary>
/// <param name="pattern"></param>
/// <param name="s"></param>
/// <param name="caseSensitive"></param>
/// <returns></returns>
public static bool Match(string pattern, string s, bool caseSensitive)
{
// if not concerned about case, convert both string and pattern
// to lower case for comparison
if (!caseSensitive)
{
pattern = pattern.ToLower();
s = s.ToLower();
}

// if pattern doesn't actually contain any wildcards, use simple equality
if (pattern.IndexOfAny(Wildcards) == -1)
return (s == pattern);

// otherwise do pattern matching
int i=0;
int j=0;
while (i < s.Length && j < pattern.Length && pattern[j] != '*')
{
if ((pattern[j] != s[i]) && (pattern[j] != '?'))
{
return false;
}
i++;
j++;
}

// if we have reached the end of the pattern without finding a * wildcard,
// the match must fail if the string is longer or shorter than the pattern
if (j == pattern.Length)
return s.Length == pattern.Length;

int cp=0;
int mp=0;
while (i < s.Length)
{
if (j < pattern.Length && pattern[j] == '*')
{
if ((j++)>=pattern.Length)
{
return true;
}
mp = j;
cp = i+1;
}
else if (j < pattern.Length && (pattern[j] == s[i] || pattern[j] == '?'))
{
j++;
i++;
}
else
{
j = mp;
i = cp++;
}
}

while (j < pattern.Length && pattern[j] == '*')
{
j++;
}

return j >= pattern.Length;
}
}

Sancy5-Jun-06 16:01

5-Jun-06 16:01

Thanks a lot. This is just what i've been looking for. Smile | :)

Smile | :)

And it fades like the shadow in the night.

PhoeniX

Convert to java base on C# version [modified, better look :~ ]

quangtin321-Mar-08 21:13

21-Mar-08 21:13

Java version
We (Qn & Qg) just search and replace to procedure this java version,

public static boolean matcher(String value, String pattern) {
    if (pattern == null || value == null) {
        return false;
    }

    char[] Wildcards = new char[]{'*', '?'};

    pattern = pattern.toLowerCase();
    value = value.toLowerCase();

    // if pattern doesn't actually contain any wildcards, use simple equality
    if (pattern.indexOf(Wildcards[0]) == -1 && pattern.indexOf(Wildcards[1]) == -1) {
        return value.equals(pattern);
    }

    // otherwise do pattern matching
    int i = 0;
    int j = 0;
    while (i < value.length() && j < pattern.length() && pattern.charAt(j) != '*') {
        if (pattern.charAt(j) != value.charAt(i) && pattern.charAt(j) != '?') {
            return false;
        }
        i++;
        j++;
    }

    // if we have reached the end of the pattern without finding a * wildcard,
    // the match must fail if the String is longer or shorter than the pattern
    if (j == pattern.length()) {
        return value.length() == pattern.length();
    }

    int cp = 0;
    int mp = 0;
    while (i < value.length()) {
        if (j < pattern.length() && pattern.charAt(j) == '*') {
            if ((j++) >= pattern.length()) {
                return true;
            }
            mp = j;
            cp = i + 1;
        }
        else if (j < pattern.length() && (pattern.charAt(j) == value.charAt(i) || pattern.charAt(j) == '?')) {
            j++;
            i++;
        }
        else {
            j = mp;
            i = cp++;
        }
    }

    while (j < pattern.length() && pattern.charAt(j) == '*') {
        j++;
    }

    return j >= pattern.length();
}

Unit test

public void testmatcher() {
      System.out.println("testmatcher");

      String[][] matchPaire = {
          {"", ""},
          {"aa", "aa"},
          {"aa", "*"}, //value,pattern
          {"a", "?"},
          {"sdwerporasl;df", "*"},
          {"absdf zzzy", "*zzy"},
          {"abc", "*?"}};

      String[][] notMatchPaire = {
          {"", "?"},
          {"ab", "?"},
          {null, null},
          {"", "*a"},
          {"bsadfasdfwer234", "a*"},
          {"a fwer234", "*a"},
          };

      for (int i = 0; i < matchPaire.length; i++) {
          System.out.print("paire " + matchPaire[i][0] + " " + matchPaire[i][1]);
          assertTrue(ExchUtils.matcher(matchPaire[i][0], matchPaire[i][1]));
          System.out.println(" ok");
      }

      for (int i = 0; i < notMatchPaire.length; i++) {
          System.out.print("paire " + notMatchPaire[i][0] + " " + notMatchPaire[i][1]);
          assertFalse(ExchUtils.matcher(notMatchPaire[i][0], notMatchPaire[i][1]));
          System.out.println(" ok");
      }
  }

thank you all.

ktmt's member.

modified on Sunday, March 30, 2008 12:42 PM

Re: C# version - an error!

Mark T.4-Jul-08 14:37

4-Jul-08 14:37

Be aware that there is a bug in this C# version.
I am still working on figuring it out fully, but:

in this code segment

int cp=0;
int mp=0;
while (i < s.Length)
{
  if (j < pattern.Length && pattern[j] == '*')
  {
    if ((j++)>=pattern.Length)
      return true;

Going into the final "if" line shown here, the maximum value that j may have is (pattern.length-1), due to the first "if" test. Then we see (j++) compared. But, the value of (j++) is the value of "j" BEFORE being incremented and thus is a maximum of (pattern.length-1) and is therefore NEVER >= pattern.length. Only after the if test is completed is j actually incremented.
So the following return is never taken.

Perhaps it can be fixed by changing j++ to ++j... but I can't tell that until I complete the analysis.

On a slightly different topic, I will state my opinion as a professional programmer. This demonstrates the extremely importance of EXTENSIVE COMMENTS in code explaining NOT what the code does, but "what the code is supposed to do" in each section. If such comments were in place, this would be an easy maintenance fix. Without them, I am having to analyze what the code DOES and, from that, try to discern what the programmer INTENDED the code to do. And, I have to consider all the possible wildcard permutations just like the original programmer did. I essentially have to reinvent the wheel... because the user manual is missing.

Everyone, especially Gurus, should put extensive comments in their code on "what it is intended to do". The only downside is lack of job security, because now someone other than you can fix the code. If you have that low of opinion of your worth to your employer, and are also lacking all compassion for others, then don't comment your code.

williamhix17-Oct-08 22:28

17-Oct-08 22:28

I think this:

if ((j++) >= pattern.Length) 
{
  return true;
}

Needs to change to this:

if (++j >= pattern.Length)
{
 return true;
}

Otherwise the early break does not happen and the whole string is searched.

Many thanks, with 1 small gripe ..

David Patrick29-Sep-04 8:41

29-Sep-04 8:41

most C compare functions return zero when the values are equal, but this function returns non-zero.

Personally, I find the non-zero to be more intuitive .. but after years of forcing myself to check for zero I find it a bit counter-intuitive.

I think I'll just rename the function when I add it to my library Smile | :)

Smile | :)

But that certainly wont stop me from using this wonderful routine.

Many sincere thanks ...

Re: Many thanks, with 1 small gripe ..

Jack Handy6-Oct-04 8:13

6-Oct-04 8:13

David Patrick wrote:
most C compare functions return zero when the values are equal, but this function returns non-zero.

You make a good point. I probably should have made it behave like the strcmp() type functions. I'm a bit afraid to change it at this point since it has been posted for so long. It should be an easy fix for you or anyone else who is used to C style string comparisons. The C++ people here probably like the current behavior I would imagine.

-Jack

There are 10 types of people in this world, those that understand binary and those who don't.

Re: Many thanks, with 1 small gripe ..

Vic Mackey16-Oct-04 19:33

16-Oct-04 19:33

I disagree. The return value for strcmp() is more than simply a test for equality, it tells you which string is greater than the other. A zero return value for strcmp() makes sense, but not for wildcmp() since the return value is strictly boolean, match or no match. The current implementation is fine (although some people might be picky about the return type, int vs bool). Perhaps to avoid confusion with string comparison functions, the function should be renamed to wildmatch() or something similar.

Re: Many thanks, with 1 small gripe ..

Voja Intermajstor24-Nov-04 23:26

Voja Intermajstor

24-Nov-04 23:26

You are completely right, Vic.

It is interesting that I have renamed the function in my code to wildmatch(). Wink | ;)

Wink | ;)

It would be a good new name.

Regards, Voja

Voja Intermajstor25-Aug-04 2:30

Voja Intermajstor

25-Aug-04 2:30

This is realy nice & and useful code. I used to write something similar, but your example is simplier and shorter.
Because it lacks comments, I spent some time to understand (before I saw comment form Targys Hmmm | :|

Hmmm | :|

- real tutorial Wink | ;)

Wink | ;)

) and it is clear now. Thanks to both of you!

To 'wise' guys, flamers, and other people who has nothing to do instead of arguing:
- If the code has a bug, report but don't pretend you are a genius or a guru. If you can do it better, submit an article.
If you don't like the code, don't use it!

And about NULL pointers:
Idiot-proofing should be implemented at the level where data (function arguments) is acquired and prepared, not in such low-level function.
Besides that, I tested several functions from string.h with NULL parameters and every single one threw an exception. No further comments...

Regards, Voja

Slight efficiency improvement

GKarRacer9-Jul-04 6:53

9-Jul-04 6:53

Great piece of code, but I have one minor improvement. It appears to me that the variable "cp" doesn't do anything and servers no purpose.

If I'm correct, then you can safely remove the line:
cp = string+1;

and also remove:
string = cp;

and replace:
cp = string++;

with:
++string;

I'm believe the results would be identical.

Re: Slight efficiency improvement

GKarRacer9-Jul-04 7:19

9-Jul-04 7:19

Maybe I posted too soon. I didn't think there was a way for cp to not equal string+1. But, after thinking about it some more I found a pattern type that would:

*???c*

It's interesting how the loop keeps shifting back and forth with this type of pattern.

However, using a test string of "testing" with the above pattern the match still failed (correctly) using both algorithms. But, there easily may be a pattern and string combination that wouldn't work without cp.

PathMatchSpec (shlwapi.h)?

peterchen28-Jun-04 6:56

28-Jun-04 6:56

Just a thought:
the PathMatchSpec SLWU API could provide similar. I guess it does have some differences (e.g. allowing to specify multiple specs, separated by semicolon), but it might be a simple alternative for many similar tasks.

we are here to help each other get through this thing, whatever it is Vonnegut jr.

sighist || Agile Programming | doxygen

Re: PathMatchSpec (shlwapi.h)?

Jack Handy13-Feb-05 9:56

13-Feb-05 9:56

peterchen wrote:
Just a thought:
the PathMatchSpec SLWU API could provide similar. I guess it does have some differences (e.g. allowing to specify multiple specs, separated by semicolon), but it might be a simple alternative for many similar tasks.

The wildcmp() function is meant to be lightweight and fast.

If the extra functionality of multple specs is needed and you don't want to parse the input yourself then you can go ahead and use the PathMatchSpec() API.

Just make sure you don't mind these limitations:

1. Adding another dependancy to your executable by including the lib
2. Not portable (wildcmp() compiles fine under unix)
3. More memory overhead (larger code footprint)
4. The horrible slowness

I have ran some benchmarks and pasted the results below. I can provide the .cpp file for the benchmarks if anyone is interested.

-Jack

10MM iterations.<br />
Compiled as a console app using vc6 in release mode with /O2 optimization.<br />
Ran on a pM 1.7ghz<br />
<br />
"C:\\T*s*.t?t", "C:\\Test\\File.txt"<br />
PathMatchSpec MATCH => 20.5090s<br />
wildcmp MATCH => 1.0320s<br />
<br />
"C:\\T*s*.t??t", "C:\\Test\\File.txt"<br />
PathMatchSpec NO MATCH => 45.7760s<br />
wildcmp NO MATCH => 0.8910s

There are 10 types of people in this world, those that understand binary and those who don't.

Doesnt seem to work well..

bikram singh13-May-04 1:56

bikram singh

13-May-04 1:56

Tried these wildcards, and they show different results in your code and in Windows Explorer's search command.

??x*
*so*
??so*
??so??

Lack of comments in the code also make it a bit difficult to understand. On the whole however, good job!

Bikram

Re: Doesnt seem to work well..

Jack Handy21-Jun-04 9:13

21-Jun-04 9:13

The Windows Explorer search is not a straight wildcard match. It essentially adds *'s to either end of your input string so "b?r" matches "foobar.txt". I take a more literal approach. This function does not presume to be smarter than the caller. It is not only meant for files, it is also very useful for checking hostmasks for example.

If you think my function is producing bad results, can you paste an example of the string along with the wildcard?

Thanks,

Jack

There are 10 types of people in this world, those that understand binary and those who don't.

Case Insensitive wildcmp

Member 95170316-Mar-04 9:36

16-Mar-04 9:36

I want case insenstive wildcmp function. Could anyone help me?

Re: Case Insensitive wildcmp

Neville Franks16-Mar-04 9:52

16-Mar-04 9:52

Simply wrap the code that compares charaters in toupper() calls.

eg.

if ((toupper(*wild) != toupper(*string)) && (*wild != '?')) {

Neville Franks, Author of ED for Windows www.getsoft.com and coming soon: Surfulater www.surfulater.com

Re: Case Insensitive wildcmp

23-Mar-04 9:33

23-Mar-04 9:33

toupper doesnt support char characters. You could optimize this code.
This code compares also multilingual characters

<br />
#include <stdio.h><br />
                        <br />
#define BIT5 0x20<br />
                        <br />
char buf[] = "this is ®Ñê test";<br />
char *pbuf;<br />
         <br />
int lower(int ch);<br />
        <br />
                <br />
int lower(int ch)<br />
{<br />
if ((ch==64)||(ch==91)||(ch==92)||(ch==93))<br />
return ch &= ~(ch & BIT5);<br />
    return ch |= BIT5;<br />
}<br />
    <br />
int wildcmp(char *wild, char *string) {<br />
        char *cp, *mp;<br />
        <br />
        while ((*string) && (*wild != '*')) {<br />
                if ((*wild != *string) && (*wild != '?')) {<br />
                        return 0;<br />
                }<br />
                wild++;<br />
                string++;<br />
        }<br />
         <br />
        while (*string) {<br />
                if (*wild == '*') {<br />
                        if (!*++wild) {<br />
                                return 1;<br />
                        }<br />
                        mp = wild;<br />
                        cp = string+1;<br />
                } else if ((*wild == *string) || (*wild == '?')) {<br />
                        wild++;<br />
                        string++;<br />
              } else {<br />
                        wild = mp;<br />
                        string = cp++;<br />
                }<br />
        }<br />
<br />
        while (*wild == '*') {<br />
                wild++;<br />
        }<br />
        return !*wild;<br />
}<br />
<br />
    <br />
int main() {<br />
    <br />
for (pbuf = &buf[0]; *pbuf; ++pbuf)<br />
*pbuf= lower (*pbuf); <br />
        <br />
if (wildcmp("*®ñê*", buf)){<br />
        printf ("match : %s \n", buf);<br />
   } else {<br />
        printf ("not match: %s \n",buf);<br />
 }<br />
}<br />
         <br />
       <br />

Re: Case Insensitive wildcmp

David Crow23-Feb-05 2:24

23-Feb-05 2:24

Techiex wrote:
toupper doesnt support char characters

Since when?

ASSERT('A' == toupper('a'));

"Opinions are neither right nor wrong. I cannot change your opinion. I can, however, change what influences your opinion." - David Crow

Re: Case Insensitive wildcmp

Vic Mackey23-Feb-05 8:00

23-Feb-05 8:00

Be careful when using toupper(), some of the CRT variations of this function _only_ work when the input is known to be lowercase. For example, the return value is invalid for _toupper('A') and some implementations of toupper('A') as well...check the documentation.

General News Suggestion Question Bug Answer Joke Praise Rant Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Go to top