|
|
Comments and Discussions
|
|
 |

|
A small fix: while ((i<str.Length) && (wild[j] != '*')) should be while (i < str.Length && j < wild.Length && wild[j] != '*') And a small improvement for case sensitivity: private bool wildcmp(string wild, string str, bool case_sensitive) { if (! case_sensitive) { wild = wild.ToLower(); str = str.ToLower(); } // rest of the code is the same } Ionut Filip
|
|
|
|

|
hiya Just thought I'd share my version of this code - put the whole shebang into a class with public static methods - fixed a bug where the pattern '?' matches all strings - added an early-exit test for patterns that don't actually contain wildcards so it just defaults to normal string comparison cheers Rob
/// <summary> /// Class providing wildcard string matching. /// </summary> public class Wildcard { private Wildcard() { } /// <summary> /// Array of valid wildcards /// </summary> private static char[] Wildcards = new char[]{'*', '?'}; /// <summary> /// Returns true if the string matches the pattern which may contain * and ? wildcards. /// Matching is done without regard to case. /// </summary> /// <param name="pattern"></param> /// <param name="s"></param> /// <returns></returns> public static bool Match(string pattern, string s) { return Match(pattern, s, false); } /// <summary> /// Returns true if the string matches the pattern which may contain * and ? wildcards. /// </summary> /// <param name="pattern"></param> /// <param name="s"></param> /// <param name="caseSensitive"></param> /// <returns></returns> public static bool Match(string pattern, string s, bool caseSensitive) { // if not concerned about case, convert both string and pattern // to lower case for comparison if (!caseSensitive) { pattern = pattern.ToLower(); s = s.ToLower(); } // if pattern doesn't actually contain any wildcards, use simple equality if (pattern.IndexOfAny(Wildcards) == -1) return (s == pattern); // otherwise do pattern matching int i=0; int j=0; while (i < s.Length && j < pattern.Length && pattern[j] != '*') { if ((pattern[j] != s[i]) && (pattern[j] != '?')) { return false; } i++; j++; } // if we have reached the end of the pattern without finding a * wildcard, // the match must fail if the string is longer or shorter than the pattern if (j == pattern.Length) return s.Length == pattern.Length; int cp=0; int mp=0; while (i < s.Length) { if (j < pattern.Length && pattern[j] == '*') { if ((j++)>=pattern.Length) { return true; } mp = j; cp = i+1; } else if (j < pattern.Length && (pattern[j] == s[i] || pattern[j] == '?')) { j++; i++; } else { j = mp; i = cp++; } } while (j < pattern.Length && pattern[j] == '*') { j++; } return j >= pattern.Length; } }
|
|
|
|

|
Thanks a lot. This is just what i've been looking for.
And it fades like the shadow in the night.
PhoeniX
|
|
|
|

|
Java version
We (Qn & Qg) just search and replace to procedure this java version,
public static boolean matcher(String value, String pattern) {
if (pattern == null || value == null) {
return false;
}
char[] Wildcards = new char[]{'*', '?'};
pattern = pattern.toLowerCase();
value = value.toLowerCase();
if (pattern.indexOf(Wildcards[0]) == -1 && pattern.indexOf(Wildcards[1]) == -1) {
return value.equals(pattern);
}
int i = 0;
int j = 0;
while (i < value.length() && j < pattern.length() && pattern.charAt(j) != '*') {
if (pattern.charAt(j) != value.charAt(i) && pattern.charAt(j) != '?') {
return false;
}
i++;
j++;
}
if (j == pattern.length()) {
return value.length() == pattern.length();
}
int cp = 0;
int mp = 0;
while (i < value.length()) {
if (j < pattern.length() && pattern.charAt(j) == '*') {
if ((j++) >= pattern.length()) {
return true;
}
mp = j;
cp = i + 1;
}
else if (j < pattern.length() && (pattern.charAt(j) == value.charAt(i) || pattern.charAt(j) == '?')) {
j++;
i++;
}
else {
j = mp;
i = cp++;
}
}
while (j < pattern.length() && pattern.charAt(j) == '*') {
j++;
}
return j >= pattern.length();
}
Unit test
public void testmatcher() {
System.out.println("testmatcher");
String[][] matchPaire = {
{"", ""},
{"aa", "aa"},
{"aa", "*"}, {"a", "?"},
{"sdwerporasl;df", "*"},
{"absdf zzzy", "*zzy"},
{"abc", "*?"}};
String[][] notMatchPaire = {
{"", "?"},
{"ab", "?"},
{null, null},
{"", "*a"},
{"bsadfasdfwer234", "a*"},
{"a fwer234", "*a"},
};
for (int i = 0; i < matchPaire.length; i++) {
System.out.print("paire " + matchPaire[i][0] + " " + matchPaire[i][1]);
assertTrue(ExchUtils.matcher(matchPaire[i][0], matchPaire[i][1]));
System.out.println(" ok");
}
for (int i = 0; i < notMatchPaire.length; i++) {
System.out.print("paire " + notMatchPaire[i][0] + " " + notMatchPaire[i][1]);
assertFalse(ExchUtils.matcher(notMatchPaire[i][0], notMatchPaire[i][1]));
System.out.println(" ok");
}
}
thank you all.
ktmt's member.
modified on Sunday, March 30, 2008 12:42 PM
|
|
|
|

|
Be aware that there is a bug in this C# version.
I am still working on figuring it out fully, but:
in this code segment
int cp=0;
int mp=0;
while (i < s.Length)
{
if (j < pattern.Length && pattern[j] == '*')
{
if ((j++)>=pattern.Length)
return true;
Going into the final "if" line shown here, the maximum value that j may have is (pattern.length-1), due to the first "if" test. Then we see (j++) compared. But, the value of (j++) is the value of "j" BEFORE being incremented and thus is a maximum of (pattern.length-1) and is therefore NEVER >= pattern.length. Only after the if test is completed is j actually incremented.
So the following return is never taken.
Perhaps it can be fixed by changing j++ to ++j... but I can't tell that until I complete the analysis.
On a slightly different topic, I will state my opinion as a professional programmer. This demonstrates the extremely importance of EXTENSIVE COMMENTS in code explaining NOT what the code does, but "what the code is supposed to do" in each section. If such comments were in place, this would be an easy maintenance fix. Without them, I am having to analyze what the code DOES and, from that, try to discern what the programmer INTENDED the code to do. And, I have to consider all the possible wildcard permutations just like the original programmer did. I essentially have to reinvent the wheel... because the user manual is missing.
Everyone, especially Gurus, should put extensive comments in their code on "what it is intended to do". The only downside is lack of job security, because now someone other than you can fix the code. If you have that low of opinion of your worth to your employer, and are also lacking all compassion for others, then don't comment your code.
|
|
|
|

|
I think this:
if ((j++) >= pattern.Length)
{
return true;
}
Needs to change to this:
if (++j >= pattern.Length)
{
return true;
}
Otherwise the early break does not happen and the whole string is searched.
|
|
|
|

|
most C compare functions return zero when the values are equal, but this function returns non-zero.
Personally, I find the non-zero to be more intuitive .. but after years of forcing myself to check for zero I find it a bit counter-intuitive.
I think I'll just rename the function when I add it to my library
But that certainly wont stop me from using this wonderful routine.
Many sincere thanks ...
|
|
|
|

|
David Patrick wrote:
most C compare functions return zero when the values are equal, but this function returns non-zero.
You make a good point. I probably should have made it behave like the strcmp() type functions. I'm a bit afraid to change it at this point since it has been posted for so long. It should be an easy fix for you or anyone else who is used to C style string comparisons. The C++ people here probably like the current behavior I would imagine.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
I disagree. The return value for strcmp() is more than simply a test for equality, it tells you which string is greater than the other. A zero return value for strcmp() makes sense, but not for wildcmp() since the return value is strictly boolean, match or no match. The current implementation is fine (although some people might be picky about the return type, int vs bool). Perhaps to avoid confusion with string comparison functions, the function should be renamed to wildmatch() or something similar.
|
|
|
|

|
You are completely right, Vic.
It is interesting that I have renamed the function in my code to wildmatch().
It would be a good new name.
Regards, Voja
|
|
|
|

|
This is realy nice & and useful code. I used to write something similar, but your example is simplier and shorter.
Because it lacks comments, I spent some time to understand (before I saw comment form Targys - real tutorial ) and it is clear now. Thanks to both of you!
To 'wise' guys, flamers, and other people who has nothing to do instead of arguing:
- If the code has a bug, report but don't pretend you are a genius or a guru. If you can do it better, submit an article.
If you don't like the code, don't use it!
And about NULL pointers:
Idiot-proofing should be implemented at the level where data (function arguments) is acquired and prepared, not in such low-level function.
Besides that, I tested several functions from string.h with NULL parameters and every single one threw an exception. No further comments...
Regards, Voja
|
|
|
|

|
Great piece of code, but I have one minor improvement. It appears to me that the variable "cp" doesn't do anything and servers no purpose.
If I'm correct, then you can safely remove the line:
cp = string+1;
and also remove:
string = cp;
and replace:
cp = string++;
with:
++string;
I'm believe the results would be identical.
|
|
|
|

|
Maybe I posted too soon. I didn't think there was a way for cp to not equal string+1. But, after thinking about it some more I found a pattern type that would:
*???c*
It's interesting how the loop keeps shifting back and forth with this type of pattern.
However, using a test string of "testing" with the above pattern the match still failed (correctly) using both algorithms. But, there easily may be a pattern and string combination that wouldn't work without cp.
|
|
|
|

|
Just a thought:
the PathMatchSpec SLWU API could provide similar. I guess it does have some differences (e.g. allowing to specify multiple specs, separated by semicolon), but it might be a simple alternative for many similar tasks.
we are here to help each other get through this thing, whatever it is Vonnegut jr.
sighist || Agile Programming | doxygen
|
|
|
|

|
peterchen wrote:
Just a thought:
the PathMatchSpec SLWU API could provide similar. I guess it does have some differences (e.g. allowing to specify multiple specs, separated by semicolon), but it might be a simple alternative for many similar tasks.
The wildcmp() function is meant to be lightweight and fast.
If the extra functionality of multple specs is needed and you don't want to parse the input yourself then you can go ahead and use the PathMatchSpec() API.
Just make sure you don't mind these limitations:
1. Adding another dependancy to your executable by including the lib
2. Not portable (wildcmp() compiles fine under unix)
3. More memory overhead (larger code footprint)
4. The horrible slowness
I have ran some benchmarks and pasted the results below. I can provide the .cpp file for the benchmarks if anyone is interested.
-Jack
10MM iterations.
Compiled as a console app using vc6 in release mode with /O2 optimization.
Ran on a pM 1.7ghz
"C:\\T*s*.t?t", "C:\\Test\\File.txt"
PathMatchSpec MATCH => 20.5090s
wildcmp MATCH => 1.0320s
"C:\\T*s*.t??t", "C:\\Test\\File.txt"
PathMatchSpec NO MATCH => 45.7760s
wildcmp NO MATCH => 0.8910s
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
Tried these wildcards, and they show different results in your code and in Windows Explorer's search command.
??x*
*so*
??so*
??so??
Lack of comments in the code also make it a bit difficult to understand. On the whole however, good job!
Bikram
|
|
|
|

|
The Windows Explorer search is not a straight wildcard match. It essentially adds *'s to either end of your input string so "b?r" matches "foobar.txt". I take a more literal approach. This function does not presume to be smarter than the caller. It is not only meant for files, it is also very useful for checking hostmasks for example.
If you think my function is producing bad results, can you paste an example of the string along with the wildcard?
Thanks,
Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
I want case insenstive wildcmp function. Could anyone help me?
|
|
|
|

|
Simply wrap the code that compares charaters in toupper() calls.
eg.
if ((toupper(*wild) != toupper(*string)) && (*wild != '?')) {
Neville Franks, Author of ED for Windows www.getsoft.com and coming soon: Surfulater www.surfulater.com
|
|
|
|

|
toupper doesnt support char characters. You could optimize this code.
This code compares also multilingual characters
#include
#define BIT5 0x20
char buf[] = "this is ®Ñê test";
char *pbuf;
int lower(int ch);
int lower(int ch)
{
if ((ch==64)||(ch==91)||(ch==92)||(ch==93))
return ch &= ~(ch & BIT5);
return ch |= BIT5;
}
int wildcmp(char *wild, char *string) {
char *cp, *mp;
while ((*string) && (*wild != '*')) {
if ((*wild != *string) && (*wild != '?')) {
return 0;
}
wild++;
string++;
}
while (*string) {
if (*wild == '*') {
if (!*++wild) {
return 1;
}
mp = wild;
cp = string+1;
} else if ((*wild == *string) || (*wild == '?')) {
wild++;
string++;
} else {
wild = mp;
string = cp++;
}
}
while (*wild == '*') {
wild++;
}
return !*wild;
}
int main() {
for (pbuf = &buf[0]; *pbuf; ++pbuf)
*pbuf= lower (*pbuf);
if (wildcmp("*®ñê*", buf)){
printf ("match : %s \n", buf);
} else {
printf ("not match: %s \n",buf);
}
}
|
|
|
|

|
Techiex wrote:
toupper doesnt support char characters
Since when?
ASSERT('A' == toupper('a'));
"Opinions are neither right nor wrong. I cannot change your opinion. I can, however, change what influences your opinion." - David Crow
|
|
|
|

|
Be careful when using toupper(), some of the CRT variations of this function _only_ work when the input is known to be lowercase. For example, the return value is invalid for _toupper('A') and some implementations of toupper('A') as well...check the documentation.
|
|
|
|

|
Vic Mackey wrote:
Be careful when using toupper(), some of the CRT variations of this function _only_ work when the input is known to be lowercase.
Which is why I specified toupper() instead of _toupper(). The latter is nothing but a #define directive that does no checking.
In any case, I was simply responding to Techiex's statement that "toupper doesnt support char characters."
"Opinions are neither right nor wrong. I cannot change your opinion. I can, however, change what influences your opinion." - David Crow
|
|
|
|

|
Doesnt seems to work for me.
|
|
|
|

|
Very nice, compact, works great and is very fast. Thanks, Jack!
Best wishes,
Hans
|
|
|
|

|
I've seen a few people in these boards complain that I didn't check for null pointers in this function. This is a C function and the last time I checked, passing NULL to strcmp or any other C string function will segfault. I'm not saying this is great, and if you wanted to add a check for null, that would be fine. I just don't think that this is a 'bug' (if you can even call it that) worth flaming an otherwise great function.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
I agree. This is not what I meant.
Efrat
|
|
|
|

|
Yeah, I wasn't talking to you, you were respectful. I was talking to the people below in the 'too complicated' thread. Namely 'The C++ Guru'.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
great code, but if I'm not mistaken
cp can point beyon string array bounds.
try: wild="*a", string="xyzab"
correction:
string = cp++;
should be changed to:
string = cp;
if(*cp) cp++;
|
|
|
|

|
I can not reproduce this bug.
wildcmp("*a", "xyzab") returns 0 as it should.
Can you elaborate on what you are doing to cause it to function incorrectly?
Thanks,
Jack
|
|
|
|

|
You're right, it is not a bug in functionality,
but when I debugged it I saw that
for the above input, the line:
string = cp++;
causes cp to point one place beyond the string array bounds.
(When string points to the last char 'b', cp points to \0.
On the next iteration, string will be advanced to point to \0,
and cp will be advanced to point to one place after the \0).
Although it is not critical,I thought it is worth fixing.
regards,
Efrat
|
|
|
|

|
Thanks, I'll have to look into this once I get some free time.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
Actually, it was brought to my attention that it is probably
legal to do that in C, since the pointer is not used afterwards.
So, maybe it's just a matter of coding practice.
|
|
|
|

|
I saw this one a long time ago, and finally have a use for it. Thank you very much.
Chris Richardson
Programmers find all sorts of ingenious ways to screw ourselves over. - Tim Smith
|
|
|
|

|
No problem. I hope it serves you well.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
I converted this into C# and bingo...
I tried to break it but couldn't
|
|
|
|

|
You didn't try hard enough:
wildcmp(NULL,whatever)
will break it.
Hector Santos, CTO
Santronics Software, Inc.
http:/www.santronics.com
|
|
|
|

|
The C# code would NOT break, instead the framework would throw an exception in this case. With appropriate exception handling routines, pointer checking in C# is useless overhead.
Cheers anyway,
K. C. Dorner
IBM Billing Solution
|
|
|
|

|
Exception handling in the dotnet framework is slow like hell
we will see
|
|
|
|

|
Where can I get the C# version?
- Bruce
BRCKCC
|
|
|
|
|

|
Yeah...good stuff. How long did it take you?
|
|
|
|

|
Could anyone explain how this code works for me? I am having trouble trying to figure out what is going on in a couple places. I would think a short explination would help out some other people like me who don't know C. Thanks in advance!
|
|
|
|

|
Ok, I'll try, even though I think it would be a good idea for you to learn C
The first loop basically goes through both strings step by step until there is a * in the wild string.
When ever the characters of the both strings don't match and the character in the wild string is no ? the function returns 0 (FALSE) = no match.
(I'm not a hundred percent sure, 'cause I don't have time to test it, but I guess this loop is for speed reasons only)
The second loop does the hard thing:
if (*wild == '*') {
if (!*++wild) {
return 1;
}
mp = wild;
cp = string+1;
This if stores the positions of the string pointers, when *wild is a star
(*wild is the character of wild at the current position of the pointer *wild - easy explanation, not 100% correct)
If this * is the last character in the wild string, it returns 1 (TRUE) = match.
} else {
wild = mp;
string = cp++;
This part if the ifs basically solves two things in one.
Firstly it's responsible to increase the pointer position of the string string pointer.
Secondly it returns the two pointers after a wrong go through to the end.
} else if ((*wild == *string) || (*wild == '?')) {
wild++;
string++;
This part does the same as the first loop, just after the first *.
while (*wild == '*') {
wild++;
}
Well, this loop just ingores several * at the end of the wild string.
return !*wild;
And now, that's a nice one
I like it.
After going through all the * in the last loop, the wild string can now contain either
- nothing anymore, that means *wild is NULL, or
- anything but nothing.
Is *wild NULL that means all the comparisons were successful and the function can return 1.
Or easier: it returns !*wild = not NULL = 1
Is it not NULL, but just any character, !*wild will be 0.
So this
return !*wild;
basically replaces
if (*wild = '') {
return 1;
} else {
return 0;
}
or something like this.
An example to explain how it really works:
wild is 'bl?h.*g'
string is 'blah.jpgeg'
After the first loop where 'b' is 'b' and 'l' is 'l' and '?' is 'a' and 'h' is 'h' and '.' is '.'
the position of the two pointers *wild and *string look like this:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
Now the second loop starts:
the pointer *string is increased until it points to a character that is the same as the *wild+1.
That means it looks for a 'g' in the string string beginning from the current position.
This increment is done by the last else, as explained above as firstly.
So it will look like this
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
Now the second part if the ifs increases both pointers, because 'g' == 'g'.
*wild is now NULL because the g was the last character in the wild string.
*string is 'e'
Because Null != 'e' it sets back the pointers to the values they had before the comparison rush.
This is done again by the last else part, as explained as secondly above.
But the diferrence is now, that the *string pointer is one character further than then.
It looks now like that:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
This change compared to the first time is done by the cp++ of
} else {
wild = mp;
string = cp++;
where the pointer cp is incremented.
The same game starts all over again and again it doesn't succeed.
So next time it will look like this:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
and:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
and finally:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
And this will end the loop, as *string will be NULL after the next run.
Well, it's not an easy explanation, as the problem of wildcard search is not really as easy as C++ Guru wants it to have.
(Maybe for a guru, it's easy )
Don't hesitate to ask, if the answer is not understandable.
And of course please correct me, if something is wrong!
Targys
|
|
|
|

|
why use local variables and so many loops? it can be much easier to match two strings. i don't understand why so many people spend hours to search the web for wildcard matching when they can write it themselves in 5 minutes time??! the code below could be shorter but it's easier to read like this. i didn't debug it very much but it will work, though. // ------------------------------------------------------------------- int wildcmp(const char* wild, const char* string) // ------------------------------------------------------------------- { if(*wild == *string) return '\0' == *string || wildcmp(++wild, ++string); if('\0' == *string) return '*' == *wild && wildcmp(++wild, string); switch(*wild) { case '?': return wildcmp(++wild, ++string); case '*': wild++; if('\0' == *wild) return 1; while(*string != '\0') if(wildcmp(wild, string++)) return 1; default: return 0; } } yours, the c++ guru himself.
|
|
|
|

|
Works fine, can't beat!
But how the hell can you make this even shorter???
|
|
|
|

|
no problem, but it looks very ugly. but i think you cannot do it with less characters. i would be glad to find out i was wrong, so try to do it shorter! int wildcmp(const char* w, const char* s) { if(*w == *s) return !*s || wildcmp(++w, ++s); if(!*s) return '*' == *w && wildcmp(++w, s); if('?' == *w) return wildcmp(++w, ++s); if('*' == *w) if(!*++w) return 1; else while(*s) if(wildcmp(w, s++)) return 1; return 0; }
|
|
|
|

|
I don't know about the rest of you, but I personally prefer FAST code to SHORT code. The function you posted is considerably slower, do some benchmarks, I have.
This code....
int main(int argc, char **argv) {
int x;
for (x=0; x<9999999; x++) {
wildcmp("*t?st?n*this*t*", "testin this sh*t");
}
}using your function...
real 0m13.170s
user 0m13.120s
sys 0m0.020s using my function...
real 0m6.804s
user 0m6.790s
sys 0m0.000s
Furthermore I don't see why you criticize me for using loops when you are using a recursive function.
Thanks for your reply anyhow,
Jack
|
|
|
|

|
i didn't criticize you.
i just said that wildcard matching is not a very heavy problem to solve and that i don't understand why people spend hours looking around to find code while it takes a cigarette's lifetime to do it yourself.
first of all i feel happy that my function works at all! secondly, i know that recursion produces noticeable overhead, and as i said it was not my primary objective to write a fast function. i just tried to write it in 5 minutes time.
|
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
|
Matches a string against a wildcard string such as "*.*" or "bl?h.*" etc. This is good for file globbing or to match hostmasks.
| Type | Article |
| Licence | |
| First Posted | 1 May 2001 |
| Views | 683,057 |
| Bookmarked | 89 times |
|
|