Click here to Skip to main content
Click here to Skip to main content

Wildcard string compare (globbing)

By , 15 Feb 2005
 

Usage:

This is a fast, lightweight, and simple pattern matching function.

if (wildcmp("bl?h.*", "blah.jpg")) {
  //we have a match!
} else {
  //no match =(
}

Function:

int wildcmp(const char *wild, const char *string) {
  // Written by Jack Handy - <A href="mailto:jakkhandy@hotmail.com">jakkhandy@hotmail.com</A>
  const char *cp = NULL, *mp = NULL;

  while ((*string) && (*wild != '*')) {
    if ((*wild != *string) && (*wild != '?')) {
      return 0;
    }
    wild++;
    string++;
  }

  while (*string) {
    if (*wild == '*') {
      if (!*++wild) {
        return 1;
      }
      mp = wild;
      cp = string+1;
    } else if ((*wild == *string) || (*wild == '?')) {
      wild++;
      string++;
    } else {
      wild = mp;
      string = cp++;
    }
  }

  while (*wild == '*') {
    wild++;
  }
  return !*wild;
}

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Jack Handy
Web Developer
United States United States
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
General[Message Removed]memberstonber18-Sep-08 14:22 
Spam message removed
GeneralUsing in Artistic Stylememberjimp023-Apr-08 4:43 
I am using this in Artistic Style, a popular multi-platform code formatter available at SourceForge.
 
http://astyle.sourceforge.net/
 
Release 1.22 added directory recursion to the project. Wildcard processing was made internal to the program. Linux has a glob function but Windows doesn't. I just used this for both of them. It let me process both platforms in a similar manner.
 
A minor change was made for Windows to make the comparison case insensitive. Linux was left case sensitive.
 
Thanks for making it available. Using this was a lot easier than writing my own. I doubt that mine would have been this sophisticated.
GeneralGeez...memberlarryfr5-Mar-08 9:39 
D'Oh! | :doh: Boy do I feel stupid. I worked on an algorithm like this for days, and never got it quite right. Then, I see the wonderful, and simplistic work of someone like this, and it reminds me that sometimes we all are guilty of 'over-engineering'...
 
Thanks Mr. Handy!
QuestionConvert to a replace?memberwilliaps20-Mar-07 8:31 
How can this code be converted to do a replace? I need to provide a find/replace dialog in an application and I don't want to jump through the hoops of the Boost library. Can anyone help?
 
Patrick
GeneralC# RexExp versionmemberspinsane4-Nov-06 6:30 
Here's RegExp version (may be easily ported to C++).
Pros: More readable, Relies on proven RegExp
Cons: Maybe slower?, If eval string contains RegExp keywords then it might result in unexpected result
 

public static bool Match(string eval, string pattern, bool caseSensitive)
{
bool match = false;
 
// Make input parameters lower-case if case is not an issue
if (!caseSensitive)
{
eval = eval.ToLower();
pattern = pattern.ToLower();
}
 
// Escape regexp special character in pattern
pattern = pattern.Replace(".", @"\.");
 
// Replace valid wildcards with regexp equivalents
pattern = pattern.Replace('?', '.').Replace("*", ".*");
 
// Add boundaries to pattern
pattern = @"\A" + pattern + @"\z";
 
// Search for a match
try
{
match = Regex.IsMatch(eval, pattern);
}
catch /* (ArgumentException ex) */
{
// Syntax error in the regular expression
}
 
// Return result
return match;
}

GeneralKudosmemberquantumred14-Oct-06 4:37 
This is tight and clever. Thanks for sharing it.
GeneralRe: Kudosmembermilkplus24-Feb-10 11:19 
I agree. This is excellent.
Generalwildcmp(&quot;*&amp;lt;*&amp;gt;&quot;, &quot;&amp;lt;field1&amp;gt;&amp;lt;field2&amp;gt;&quot;) not working [modified]memberDaniel B.6-Sep-06 13:14 
Hi,
 
wildcmp("*<*>", "<field1><field2>") return 1 while I think it should return 0 (I maybe wrong, so please tell me).
 
If someone knows how to fix it, I will appreciate.
 
Regards

GeneralRe: wildcmp(&quot;*&amp;lt;*&amp;gt;&quot;, &quot;&amp;lt;field1&amp;gt;&amp;lt;field2&amp;gt;&quot;) not workingmemberradboudp16-Feb-07 0:35 
Sure it matches. The first '*' matches ''. '<*>' matches ''
 
Regards,
Radboud
Generalreturn value typememberwdx048-Jan-06 15:49 
I think it's better to make the function return a bool value. Anyway, many string comparision functions return 0 when the strings equal.
General*? case matchmembertalimu3-Nov-05 23:42 
if wild = "*?.abc", str = "abc.abc"
wildcmp(wild, str) not work
 
but if wild = "?*.abc", str = "abc.abc"
wildcmp(wild, str) do work
 
does anyone have any idea about the case?
GeneralRe: *? case matchmemberkuhnm15-Sep-06 2:18 
Having similar problems with "*Hallo 200? ueberalles*.ddd".
It doesn´t work. I think, when the first * is finished, it does not expect an other wildcard in the pattern to follow.
GeneralRe: *? case matchmemberkuhnm18-Sep-06 4:48 
Ignore my last email,
like usually the problem sits in front of the screen.
(I mixed a project built with multibyte Chars with this code which was only chars. And of course I used a Umlaut instead of 'ue' in my tests. So no wonder, why it crashed after the '?' )
I´m very sorry!
GeneralGets my 5memberFranc Morales18-Oct-05 17:05 
Simple, fast, useful, AND fun to figure out.
 
Well done.
Generalmp and cpmembertwopieman15-Mar-05 11:59 
i got the overall flow of the program I didnt get the logic of the second loop completely. I understand that in the second loop it checks if there is nothing after * if so then it is a match but if there is something it stores them in the two pointers and then goes on.
also in the final else it goes like else
{
wild = mp;
string = cp++;
}
am sorry but am not getting the logic totally.
can someone please explain?
GeneralRe: mp and cpmemberradboudp16-Feb-07 1:14 
In case you are matching something like the following:
 
"*.abc" to "ab.de.abc"
 
In the second loop it looks for the first character after the asterisk that is the same in the string. At first it matches "*" against "ab". mp = ".abc" during this. Now wild = ".abc" and string = ".de.abc". Obvious no match. On the next loop the first characters do match (both '.') and wild becomes "abc" and string "de.abc". The next loop there is no match and it falls to the else. Here it resets wild to the last mp (mask pattern??) and string to the last cp (character pattern) WITHOUT THE FIRST CHARACTER. (It actually advances cp one position.)
 
Why does it do this. After matching the * against part of the string and encountering a possible poisiton where to match the remainder of the pattern, it continued comparing characters from both to each other. This fialed. Since right before the position of mp there was a *, it is still allowed to add characters to the part that is matched against that. Basically, it goes back to that position but decides that the character that occurs in both strings is not the next character in the pattern but part of the '*' wildcard.
 
In the end it has matched '*' with 'ab.de'.
GeneralOK, but ...memberSam Levy16-Feb-05 4:48 
what was changed?
QuestionWhy make 3 loop ?memberDarkYoda Mickael2-Feb-05 22:22 
Hello,
 
i think this post is very interesting because is very simple and make very cool work !
 
BUT !
 
I don't understand why you make 3 loop to do it ?
 
I think i don't see all case, because for me only the 2 loop make all the work ?
 
I'm trying to understand all the process to add optionnal char with the ^ escape sequence, for exemple : ^-* match -12 or 12 Wink | ;)
 
Thanks
AnswerRe: Why make 3 loop ?memberJack Handy13-Feb-05 10:02 
DarkYoda Mickael wrote:
I don't understand why you make 3 loop to do it ?
 
I think i don't see all case, because for me only the 2 loop make all the work ?

 
The third loop:
 
while (*wild == '*') {
    wild++;
}

 
is there to take care of trailing *'s. Since * means 0 or more chars, "test*" should match "test" just fine. That loop takes care of this case.
 
-Jack
 

There are 10 types of people in this world, those that understand binary and those who don't.


GeneralC# versionmemberSancy26-Oct-04 6:23 
Hi, i have a stupid question, could someone give me the c# version Smile | :)
thanks in advance
GeneralRe: C# versionsussPsyk6621-Dec-04 3:39 
	private bool wildcmp(string wild, string str) 
	{
		int cp=0, mp=0;
	
		int i=0;
		int j=0;
		while ((i<str.Length) && (wild[j] != '*')) 
		{
			if ((wild[j] != str[i]) && (wild[j] != '?')) 
			{
				return false;
			}
			i++;
			j++;
		}
		
		while (i<str.Length) 
		{
			if (j<wild.Length && wild[j] == '*') 
			{
				if ((j++)>=wild.Length) 
				{
					return true;
				}
				mp = j;
				cp = i+1;
			} 
			else if (j<wild.Length && (wild[j] == str[i] || wild[j] == '?')) 
			{
				j++;
				i++;
			} 
			else 
			{
				j = mp;
				i = cp++;
			}
		}
		
		while (j<wild.Length && wild[j] == '*') 
		{
			j++;
		}
		return j>=wild.Length;
	}
 
This C# version works. I'm sure there are loads of improvements to be made though. Don't flame me for such bad code, I only started C# yesterday;)
GeneralRe: C# versionmemberIonut FIlip22-Feb-05 6:15 
A small fix:
   while ((i<str.Length) && (wild[j] != '*'))
should be
   while (i < str.Length && j < wild.Length && wild[j] != '*')
 
And a small improvement for case sensitivity:
private bool wildcmp(string wild, string str, bool case_sensitive)
{
   if (! case_sensitive)
   {
      wild = wild.ToLower();
      str = str.ToLower();
   }
 
   // rest of the code is the same
}

 
Ionut Filip
GeneralRe: C# versionmemberrobagar3-Apr-06 16:58 
hiya
 
Just thought I'd share my version of this code
 
- put the whole shebang into a class with public static methods
- fixed a bug where the pattern '?' matches all strings
- added an early-exit test for patterns that don't actually contain wildcards so it just defaults to normal string comparison
 
cheers
Rob
 

 

     /// <summary>
     /// Class providing wildcard string matching.
     /// </summary>
     public class Wildcard
     {
          private Wildcard()
          {
          }
 
          /// <summary>
          /// Array of valid wildcards
          /// </summary>
          private static char[] Wildcards = new char[]{'*', '?'};
 
          /// <summary>
          /// Returns true if the string matches the pattern which may contain * and ? wildcards.
          /// Matching is done without regard to case.
          /// </summary>
          /// <param name="pattern"></param>
          /// <param name="s"></param>
          /// <returns></returns>
          public static bool Match(string pattern, string s)
          {
               return Match(pattern, s, false);
          }
 
          /// <summary>
          /// Returns true if the string matches the pattern which may contain * and ? wildcards.
          /// </summary>
          /// <param name="pattern"></param>
          /// <param name="s"></param>
          /// <param name="caseSensitive"></param>
          /// <returns></returns>
          public static bool Match(string pattern, string s, bool caseSensitive)
          {
               // if not concerned about case, convert both string and pattern
               // to lower case for comparison
               if (!caseSensitive)
               {
                    pattern = pattern.ToLower();
                    s = s.ToLower();
               }
 
               // if pattern doesn't actually contain any wildcards, use simple equality
               if (pattern.IndexOfAny(Wildcards) == -1)
                    return (s == pattern);
 
               // otherwise do pattern matching
               int i=0;
               int j=0;
               while (i < s.Length && j < pattern.Length && pattern[j] != '*')
               {
                    if ((pattern[j] != s[i]) && (pattern[j] != '?'))
                    {
                         return false;
                    }
                    i++;
                    j++;
               }
 
               // if we have reached the end of the pattern without finding a * wildcard,
               // the match must fail if the string is longer or shorter than the pattern
               if (j == pattern.Length)
                    return s.Length == pattern.Length;
         
               int cp=0;
               int mp=0;
               while (i < s.Length)
               {
                    if (j < pattern.Length && pattern[j] == '*')
                    {
                         if ((j++)>=pattern.Length)
                         {
                              return true;
                         }
                         mp = j;
                         cp = i+1;
                    }
                    else if (j < pattern.Length && (pattern[j] == s[i] || pattern[j] == '?'))
                    {
                         j++;
                         i++;
                    }
                    else
                    {
                         j = mp;
                         i = cp++;
                    }
               }
         
               while (j < pattern.Length && pattern[j] == '*')
               {
                    j++;
               }
 
               return j >= pattern.Length;
          }
     }

GeneralRe: C# versionmemberSancy5-Jun-06 16:01 
Thanks a lot. This is just what i've been looking for. Smile | :)
 
And it fades like the shadow in the night.
 
PhoeniX
GeneralConvert to java base on C# version [modified, better look :~ ]memberquangtin321-Mar-08 21:13 
Java version
We (Qn & Qg) just search and replace to procedure this java version,
    public static boolean matcher(String value, String pattern) {
        if (pattern == null || value == null) {
            return false;
        }
 
        char[] Wildcards = new char[]{'*', '?'};
 
        pattern = pattern.toLowerCase();
        value = value.toLowerCase();
 
        // if pattern doesn't actually contain any wildcards, use simple equality
        if (pattern.indexOf(Wildcards[0]) == -1 && pattern.indexOf(Wildcards[1]) == -1) {
            return value.equals(pattern);
        }
 
        // otherwise do pattern matching
        int i = 0;
        int j = 0;
        while (i < value.length() && j < pattern.length() && pattern.charAt(j) != '*') {
            if (pattern.charAt(j) != value.charAt(i) && pattern.charAt(j) != '?') {
                return false;
            }
            i++;
            j++;
        }
 
        // if we have reached the end of the pattern without finding a * wildcard,
        // the match must fail if the String is longer or shorter than the pattern
        if (j == pattern.length()) {
            return value.length() == pattern.length();
        }
 
        int cp = 0;
        int mp = 0;
        while (i < value.length()) {
            if (j < pattern.length() && pattern.charAt(j) == '*') {
                if ((j++) >= pattern.length()) {
                    return true;
                }
                mp = j;
                cp = i + 1;
            }
            else if (j < pattern.length() && (pattern.charAt(j) == value.charAt(i) || pattern.charAt(j) == '?')) {
                j++;
                i++;
            }
            else {
                j = mp;
                i = cp++;
            }
        }
 
        while (j < pattern.length() && pattern.charAt(j) == '*') {
            j++;
        }
 
        return j >= pattern.length();
    }
 
Unit test
  public void testmatcher() {
        System.out.println("testmatcher");
 
        String[][] matchPaire = {
            {"", ""},
            {"aa", "aa"},
            {"aa", "*"}, //value,pattern
            {"a", "?"},
            {"sdwerporasl;df", "*"},
            {"absdf zzzy", "*zzy"},
            {"abc", "*?"}};
 
        String[][] notMatchPaire = {
            {"", "?"},
            {"ab", "?"},
            {null, null},
            {"", "*a"},
            {"bsadfasdfwer234", "a*"},
            {"a fwer234", "*a"},
            };
 
        for (int i = 0; i < matchPaire.length; i++) {
            System.out.print("paire " + matchPaire[i][0] + " " + matchPaire[i][1]);
            assertTrue(ExchUtils.matcher(matchPaire[i][0], matchPaire[i][1]));
            System.out.println(" ok");
        }
 
        for (int i = 0; i < notMatchPaire.length; i++) {
            System.out.print("paire " + notMatchPaire[i][0] + " " + notMatchPaire[i][1]);
            assertFalse(ExchUtils.matcher(notMatchPaire[i][0], notMatchPaire[i][1]));
            System.out.println(" ok");
        }
    }
 
thank you all.
 
ktmt's member.
modified on Sunday, March 30, 2008 12:42 PM

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130617.1 | Last Updated 15 Feb 2005
Article Copyright 2001 by Jack Handy
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid