 |

|
You are completely right, Vic.
It is interesting that I have renamed the function in my code to wildmatch().
It would be a good new name.
Regards, Voja
|
|
|
|

|
This is realy nice & and useful code. I used to write something similar, but your example is simplier and shorter.
Because it lacks comments, I spent some time to understand (before I saw comment form Targys - real tutorial ) and it is clear now. Thanks to both of you!
To 'wise' guys, flamers, and other people who has nothing to do instead of arguing:
- If the code has a bug, report but don't pretend you are a genius or a guru. If you can do it better, submit an article.
If you don't like the code, don't use it!
And about NULL pointers:
Idiot-proofing should be implemented at the level where data (function arguments) is acquired and prepared, not in such low-level function.
Besides that, I tested several functions from string.h with NULL parameters and every single one threw an exception. No further comments...
Regards, Voja
|
|
|
|

|
Great piece of code, but I have one minor improvement. It appears to me that the variable "cp" doesn't do anything and servers no purpose.
If I'm correct, then you can safely remove the line:
cp = string+1;
and also remove:
string = cp;
and replace:
cp = string++;
with:
++string;
I'm believe the results would be identical.
|
|
|
|

|
Maybe I posted too soon. I didn't think there was a way for cp to not equal string+1. But, after thinking about it some more I found a pattern type that would:
*???c*
It's interesting how the loop keeps shifting back and forth with this type of pattern.
However, using a test string of "testing" with the above pattern the match still failed (correctly) using both algorithms. But, there easily may be a pattern and string combination that wouldn't work without cp.
|
|
|
|

|
Just a thought:
the PathMatchSpec SLWU API could provide similar. I guess it does have some differences (e.g. allowing to specify multiple specs, separated by semicolon), but it might be a simple alternative for many similar tasks.
we are here to help each other get through this thing, whatever it is Vonnegut jr.
sighist || Agile Programming | doxygen
|
|
|
|

|
peterchen wrote:
Just a thought:
the PathMatchSpec SLWU API could provide similar. I guess it does have some differences (e.g. allowing to specify multiple specs, separated by semicolon), but it might be a simple alternative for many similar tasks.
The wildcmp() function is meant to be lightweight and fast.
If the extra functionality of multple specs is needed and you don't want to parse the input yourself then you can go ahead and use the PathMatchSpec() API.
Just make sure you don't mind these limitations:
1. Adding another dependancy to your executable by including the lib
2. Not portable (wildcmp() compiles fine under unix)
3. More memory overhead (larger code footprint)
4. The horrible slowness
I have ran some benchmarks and pasted the results below. I can provide the .cpp file for the benchmarks if anyone is interested.
-Jack
10MM iterations.
Compiled as a console app using vc6 in release mode with /O2 optimization.
Ran on a pM 1.7ghz
"C:\\T*s*.t?t", "C:\\Test\\File.txt"
PathMatchSpec MATCH => 20.5090s
wildcmp MATCH => 1.0320s
"C:\\T*s*.t??t", "C:\\Test\\File.txt"
PathMatchSpec NO MATCH => 45.7760s
wildcmp NO MATCH => 0.8910s
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
Tried these wildcards, and they show different results in your code and in Windows Explorer's search command.
??x*
*so*
??so*
??so??
Lack of comments in the code also make it a bit difficult to understand. On the whole however, good job!
Bikram
|
|
|
|

|
The Windows Explorer search is not a straight wildcard match. It essentially adds *'s to either end of your input string so "b?r" matches "foobar.txt". I take a more literal approach. This function does not presume to be smarter than the caller. It is not only meant for files, it is also very useful for checking hostmasks for example.
If you think my function is producing bad results, can you paste an example of the string along with the wildcard?
Thanks,
Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
I want case insenstive wildcmp function. Could anyone help me?
|
|
|
|

|
Simply wrap the code that compares charaters in toupper() calls.
eg.
if ((toupper(*wild) != toupper(*string)) && (*wild != '?')) {
Neville Franks, Author of ED for Windows www.getsoft.com and coming soon: Surfulater www.surfulater.com
|
|
|
|

|
toupper doesnt support char characters. You could optimize this code.
This code compares also multilingual characters
#include
#define BIT5 0x20
char buf[] = "this is ®Ñê test";
char *pbuf;
int lower(int ch);
int lower(int ch)
{
if ((ch==64)||(ch==91)||(ch==92)||(ch==93))
return ch &= ~(ch & BIT5);
return ch |= BIT5;
}
int wildcmp(char *wild, char *string) {
char *cp, *mp;
while ((*string) && (*wild != '*')) {
if ((*wild != *string) && (*wild != '?')) {
return 0;
}
wild++;
string++;
}
while (*string) {
if (*wild == '*') {
if (!*++wild) {
return 1;
}
mp = wild;
cp = string+1;
} else if ((*wild == *string) || (*wild == '?')) {
wild++;
string++;
} else {
wild = mp;
string = cp++;
}
}
while (*wild == '*') {
wild++;
}
return !*wild;
}
int main() {
for (pbuf = &buf[0]; *pbuf; ++pbuf)
*pbuf= lower (*pbuf);
if (wildcmp("*®ñê*", buf)){
printf ("match : %s \n", buf);
} else {
printf ("not match: %s \n",buf);
}
}
|
|
|
|

|
Techiex wrote:
toupper doesnt support char characters
Since when?
ASSERT('A' == toupper('a'));
"Opinions are neither right nor wrong. I cannot change your opinion. I can, however, change what influences your opinion." - David Crow
|
|
|
|

|
Be careful when using toupper(), some of the CRT variations of this function _only_ work when the input is known to be lowercase. For example, the return value is invalid for _toupper('A') and some implementations of toupper('A') as well...check the documentation.
|
|
|
|

|
Vic Mackey wrote:
Be careful when using toupper(), some of the CRT variations of this function _only_ work when the input is known to be lowercase.
Which is why I specified toupper() instead of _toupper(). The latter is nothing but a #define directive that does no checking.
In any case, I was simply responding to Techiex's statement that "toupper doesnt support char characters."
"Opinions are neither right nor wrong. I cannot change your opinion. I can, however, change what influences your opinion." - David Crow
|
|
|
|

|
Doesnt seems to work for me.
|
|
|
|

|
Very nice, compact, works great and is very fast. Thanks, Jack!
Best wishes,
Hans
|
|
|
|

|
I've seen a few people in these boards complain that I didn't check for null pointers in this function. This is a C function and the last time I checked, passing NULL to strcmp or any other C string function will segfault. I'm not saying this is great, and if you wanted to add a check for null, that would be fine. I just don't think that this is a 'bug' (if you can even call it that) worth flaming an otherwise great function.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
I agree. This is not what I meant.
Efrat
|
|
|
|

|
Yeah, I wasn't talking to you, you were respectful. I was talking to the people below in the 'too complicated' thread. Namely 'The C++ Guru'.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
great code, but if I'm not mistaken
cp can point beyon string array bounds.
try: wild="*a", string="xyzab"
correction:
string = cp++;
should be changed to:
string = cp;
if(*cp) cp++;
|
|
|
|

|
I can not reproduce this bug.
wildcmp("*a", "xyzab") returns 0 as it should.
Can you elaborate on what you are doing to cause it to function incorrectly?
Thanks,
Jack
|
|
|
|

|
You're right, it is not a bug in functionality,
but when I debugged it I saw that
for the above input, the line:
string = cp++;
causes cp to point one place beyond the string array bounds.
(When string points to the last char 'b', cp points to \0.
On the next iteration, string will be advanced to point to \0,
and cp will be advanced to point to one place after the \0).
Although it is not critical,I thought it is worth fixing.
regards,
Efrat
|
|
|
|

|
Thanks, I'll have to look into this once I get some free time.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
Actually, it was brought to my attention that it is probably
legal to do that in C, since the pointer is not used afterwards.
So, maybe it's just a matter of coding practice.
|
|
|
|

|
I saw this one a long time ago, and finally have a use for it. Thank you very much.
Chris Richardson
Programmers find all sorts of ingenious ways to screw ourselves over. - Tim Smith
|
|
|
|

|
No problem. I hope it serves you well.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|

|
I converted this into C# and bingo...
I tried to break it but couldn't
|
|
|
|

|
You didn't try hard enough:
wildcmp(NULL,whatever)
will break it.
Hector Santos, CTO
Santronics Software, Inc.
http:/www.santronics.com
|
|
|
|

|
The C# code would NOT break, instead the framework would throw an exception in this case. With appropriate exception handling routines, pointer checking in C# is useless overhead.
Cheers anyway,
K. C. Dorner
IBM Billing Solution
|
|
|
|

|
Exception handling in the dotnet framework is slow like hell
we will see
|
|
|
|

|
Where can I get the C# version?
- Bruce
BRCKCC
|
|
|
|
|

|
Yeah...good stuff. How long did it take you?
|
|
|
|

|
Could anyone explain how this code works for me? I am having trouble trying to figure out what is going on in a couple places. I would think a short explination would help out some other people like me who don't know C. Thanks in advance!
|
|
|
|

|
Ok, I'll try, even though I think it would be a good idea for you to learn C
The first loop basically goes through both strings step by step until there is a * in the wild string.
When ever the characters of the both strings don't match and the character in the wild string is no ? the function returns 0 (FALSE) = no match.
(I'm not a hundred percent sure, 'cause I don't have time to test it, but I guess this loop is for speed reasons only)
The second loop does the hard thing:
if (*wild == '*') {
if (!*++wild) {
return 1;
}
mp = wild;
cp = string+1;
This if stores the positions of the string pointers, when *wild is a star
(*wild is the character of wild at the current position of the pointer *wild - easy explanation, not 100% correct)
If this * is the last character in the wild string, it returns 1 (TRUE) = match.
} else {
wild = mp;
string = cp++;
This part if the ifs basically solves two things in one.
Firstly it's responsible to increase the pointer position of the string string pointer.
Secondly it returns the two pointers after a wrong go through to the end.
} else if ((*wild == *string) || (*wild == '?')) {
wild++;
string++;
This part does the same as the first loop, just after the first *.
while (*wild == '*') {
wild++;
}
Well, this loop just ingores several * at the end of the wild string.
return !*wild;
And now, that's a nice one
I like it.
After going through all the * in the last loop, the wild string can now contain either
- nothing anymore, that means *wild is NULL, or
- anything but nothing.
Is *wild NULL that means all the comparisons were successful and the function can return 1.
Or easier: it returns !*wild = not NULL = 1
Is it not NULL, but just any character, !*wild will be 0.
So this
return !*wild;
basically replaces
if (*wild = '') {
return 1;
} else {
return 0;
}
or something like this.
An example to explain how it really works:
wild is 'bl?h.*g'
string is 'blah.jpgeg'
After the first loop where 'b' is 'b' and 'l' is 'l' and '?' is 'a' and 'h' is 'h' and '.' is '.'
the position of the two pointers *wild and *string look like this:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
Now the second loop starts:
the pointer *string is increased until it points to a character that is the same as the *wild+1.
That means it looks for a 'g' in the string string beginning from the current position.
This increment is done by the last else, as explained above as firstly.
So it will look like this
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
Now the second part if the ifs increases both pointers, because 'g' == 'g'.
*wild is now NULL because the g was the last character in the wild string.
*string is 'e'
Because Null != 'e' it sets back the pointers to the values they had before the comparison rush.
This is done again by the last else part, as explained as secondly above.
But the diferrence is now, that the *string pointer is one character further than then.
It looks now like that:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
This change compared to the first time is done by the cp++ of
} else {
wild = mp;
string = cp++;
where the pointer cp is incremented.
The same game starts all over again and again it doesn't succeed.
So next time it will look like this:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
and:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
and finally:
*wild |
'bl?h.*g'
'blah.jpgeg'
*string |
And this will end the loop, as *string will be NULL after the next run.
Well, it's not an easy explanation, as the problem of wildcard search is not really as easy as C++ Guru wants it to have.
(Maybe for a guru, it's easy )
Don't hesitate to ask, if the answer is not understandable.
And of course please correct me, if something is wrong!
Targys
|
|
|
|

|
why use local variables and so many loops? it can be much easier to match two strings. i don't understand why so many people spend hours to search the web for wildcard matching when they can write it themselves in 5 minutes time??! the code below could be shorter but it's easier to read like this. i didn't debug it very much but it will work, though. // ------------------------------------------------------------------- int wildcmp(const char* wild, const char* string) // ------------------------------------------------------------------- { if(*wild == *string) return '\0' == *string || wildcmp(++wild, ++string); if('\0' == *string) return '*' == *wild && wildcmp(++wild, string); switch(*wild) { case '?': return wildcmp(++wild, ++string); case '*': wild++; if('\0' == *wild) return 1; while(*string != '\0') if(wildcmp(wild, string++)) return 1; default: return 0; } } yours, the c++ guru himself.
|
|
|
|

|
Works fine, can't beat!
But how the hell can you make this even shorter???
|
|
|
|

|
no problem, but it looks very ugly. but i think you cannot do it with less characters. i would be glad to find out i was wrong, so try to do it shorter! int wildcmp(const char* w, const char* s) { if(*w == *s) return !*s || wildcmp(++w, ++s); if(!*s) return '*' == *w && wildcmp(++w, s); if('?' == *w) return wildcmp(++w, ++s); if('*' == *w) if(!*++w) return 1; else while(*s) if(wildcmp(w, s++)) return 1; return 0; }
|
|
|
|

|
I don't know about the rest of you, but I personally prefer FAST code to SHORT code. The function you posted is considerably slower, do some benchmarks, I have.
This code....
int main(int argc, char **argv) {
int x;
for (x=0; x<9999999; x++) {
wildcmp("*t?st?n*this*t*", "testin this sh*t");
}
}using your function...
real 0m13.170s
user 0m13.120s
sys 0m0.020s using my function...
real 0m6.804s
user 0m6.790s
sys 0m0.000s
Furthermore I don't see why you criticize me for using loops when you are using a recursive function.
Thanks for your reply anyhow,
Jack
|
|
|
|

|
i didn't criticize you.
i just said that wildcard matching is not a very heavy problem to solve and that i don't understand why people spend hours looking around to find code while it takes a cigarette's lifetime to do it yourself.
first of all i feel happy that my function works at all! secondly, i know that recursion produces noticeable overhead, and as i said it was not my primary objective to write a fast function. i just tried to write it in 5 minutes time.
|
|
|
|
|
|

|
Gotta go with c++ guru on this one.
unless you are writing code for slow embedded processors with 1K of RAM, you are better off with the much clearer code from c++ guru. I can see at a glance how his code works. However, if there happened to be a bug in Handy code, or i needed to upgrade the functionality, i wouldn't have a clue where to start.
Readability is extremely important when sharing code around.
|
|
|
|

|
What is unclear about my code? Do you not understand pointer arithmetic?
-Jack
|
|
|
|

|
It is just clearer to use C++ gurus style. I take a lot of code from code project and am very grateful for it. With most of the classes i have to make some modification or other as I am writing a commercial application which often requires tweaks. I need wildcard matching in a number of places but maybe a little more than * and ?. Therefore, the first thing i considered was how i could add extra tokens. It appears at a glance that c++ gurus code is easier to grasp.
I understand pointer arithmethic just fine, and i believe the claim that your code is twice as fast. It is more important for me to have bug free code as wildcard comparison functionality is carried out on a very small scale in my application. Any bugs are headaches. I am firmly a c/c++ devotee and hate all the luggage that comes with .NET, VB, etc. But we can have a small tradeoff for readability in the general case. Why give those guys ammunition for their 'C/C++ code is unreadable and only hacks use it' argument?
No offense but i will use the gurus code because i have my reasons. It is not fair to blindly knock it though.
anyway, it is better you submitted this article than no article.
cheers
|
|
|
|

|
This code looks like it would perform slow as hell. A recursive function like this would have an incredible amount of overhead. I think in this case, re-inventing the wheel will bite you in the ass. I imagine that on a longer string, this function will get slower and slower.
|
|
|
|

|
Because often when you reinvent the wheel, you forget to put some spokes in!
It's not just coding time, but also testing time.
|
|
|
|

|
Well... all the other answers speak for themselves, so
I will try not to add anything to it.
May be your code is shorter, and may be it works.
Nevertheless you could have written it a bit more gentle.
Like this, it seems quite a bit arrogant.
In fact, the choice of speed or space always depends on what you need.
Usually speed is more usefull, sometimes you need space (like on a small
linux firewall that should fit on a floppy, or so)
Hope you don't mind that we will use Jack Handy's code
Targys
|
|
|
|

|
a bit more gentle?
if you think jack handy's code is more gentle, fine. but i think you're one of the use-and-dont-ask kind of guys then.
even if it is the faster solution (which end-user application ever requires a million wildcard matches?) the readability of jack's procedure approaches zero. if you do not recognize this, you're either in the wrong business or talking about things you don't know about.
additionally, your 'fit on a floppy' argument is also in the wrong place: my code is shorter, and so is the compiler output.
of course i don't mind if you use jack's code. but if i were to maintain the code, i'd prefer my solution.
regards
|
|
|
|

|
The C++ Guru wrote:
a bit more gentle?
if you think jack handy's code is more gentle, fine. but i think you're one of the use-and-dont-ask kind of guys then.
I think he was talking about your arrogant attitude in your post, not your code.
-Jack
|
|
|
|
 |