|
I agree. This is excellent.
|
|
|
|
|
Hi,
wildcmp("*<*>", "<field1><field2>") return 1 while I think it should return 0 (I maybe wrong, so please tell me).
If someone knows how to fix it, I will appreciate.
Regards
|
|
|
|
|
Sure it matches. The first '*' matches '<field1>'. '<*>' matches '<field2>'
Regards,
Radboud
|
|
|
|
|
I think it's better to make the function return a bool value. Anyway, many string comparision functions return 0 when the strings equal.
|
|
|
|
|
if wild = "*?.abc", str = "abc.abc"
wildcmp(wild, str) not work
but if wild = "?*.abc", str = "abc.abc"
wildcmp(wild, str) do work
does anyone have any idea about the case?
|
|
|
|
|
Having similar problems with "*Hallo 200? ueberalles*.ddd".
It doesn´t work. I think, when the first * is finished, it does not expect an other wildcard in the pattern to follow.
|
|
|
|
|
Ignore my last email,
like usually the problem sits in front of the screen.
(I mixed a project built with multibyte Chars with this code which was only chars. And of course I used a Umlaut instead of 'ue' in my tests. So no wonder, why it crashed after the '?' )
I´m very sorry!
|
|
|
|
|
Simple, fast, useful, AND fun to figure out.
Well done.
|
|
|
|
|
i got the overall flow of the program I didnt get the logic of the second loop completely. I understand that in the second loop it checks if there is nothing after * if so then it is a match but if there is something it stores them in the two pointers and then goes on.
also in the final else it goes like else
{
wild = mp;
string = cp++;
}
am sorry but am not getting the logic totally.
can someone please explain?
|
|
|
|
|
In case you are matching something like the following:
"*.abc" to "ab.de.abc"
In the second loop it looks for the first character after the asterisk that is the same in the string. At first it matches "*" against "ab". mp = ".abc" during this. Now wild = ".abc" and string = ".de.abc". Obvious no match. On the next loop the first characters do match (both '.') and wild becomes "abc" and string "de.abc". The next loop there is no match and it falls to the else. Here it resets wild to the last mp (mask pattern??) and string to the last cp (character pattern) WITHOUT THE FIRST CHARACTER. (It actually advances cp one position.)
Why does it do this. After matching the * against part of the string and encountering a possible poisiton where to match the remainder of the pattern, it continued comparing characters from both to each other. This fialed. Since right before the position of mp there was a *, it is still allowed to add characters to the part that is matched against that. Basically, it goes back to that position but decides that the character that occurs in both strings is not the next character in the pattern but part of the '*' wildcard.
In the end it has matched '*' with 'ab.de'.
|
|
|
|
|
|
Hello,
i think this post is very interesting because is very simple and make very cool work !
BUT !
I don't understand why you make 3 loop to do it ?
I think i don't see all case, because for me only the 2 loop make all the work ?
I'm trying to understand all the process to add optionnal char with the ^ escape sequence, for exemple : ^-* match -12 or 12
Thanks
|
|
|
|
|
DarkYoda Mickael wrote:
I don't understand why you make 3 loop to do it ?
I think i don't see all case, because for me only the 2 loop make all the work ?
The third loop:
while (*wild == '*') {<br />
wild++;<br />
}
is there to take care of trailing *'s. Since * means 0 or more chars, "test*" should match "test" just fine. That loop takes care of this case.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|
|
Hi, i have a stupid question, could someone give me the c# version
thanks in advance
|
|
|
|
|
private bool wildcmp(string wild, string str)
{
int cp=0, mp=0;
int i=0;
int j=0;
while ((i<str.Length) && (wild[j] != '*'))
{
if ((wild[j] != str[i]) && (wild[j] != '?'))
{
return false;
}
i++;
j++;
}
while (i<str.Length)
{
if (j<wild.Length && wild[j] == '*')
{
if ((j++)>=wild.Length)
{
return true;
}
mp = j;
cp = i+1;
}
else if (j<wild.Length && (wild[j] == str[i] || wild[j] == '?'))
{
j++;
i++;
}
else
{
j = mp;
i = cp++;
}
}
while (j<wild.Length && wild[j] == '*')
{
j++;
}
return j>=wild.Length;
}
This C# version works. I'm sure there are loads of improvements to be made though. Don't flame me for such bad code, I only started C# yesterday;)
|
|
|
|
|
A small fix:
while ((i<str.Length) && (wild[j] != '*'))
should be
while (i < str.Length && j < wild.Length && wild[j] != '*')
And a small improvement for case sensitivity:
private bool wildcmp(string wild, string str, bool case_sensitive)
{
if (! case_sensitive)
{
wild = wild.ToLower();
str = str.ToLower();
}
// rest of the code is the same
}
Ionut Filip
|
|
|
|
|
hiya
Just thought I'd share my version of this code
- put the whole shebang into a class with public static methods
- fixed a bug where the pattern '?' matches all strings
- added an early-exit test for patterns that don't actually contain wildcards so it just defaults to normal string comparison
cheers
Rob
/// <summary>
/// Class providing wildcard string matching.
/// </summary>
public class Wildcard
{
private Wildcard()
{
}
/// <summary>
/// Array of valid wildcards
/// </summary>
private static char[] Wildcards = new char[]{'*', '?'};
/// <summary>
/// Returns true if the string matches the pattern which may contain * and ? wildcards.
/// Matching is done without regard to case.
/// </summary>
/// <param name="pattern"></param>
/// <param name="s"></param>
/// <returns></returns>
public static bool Match(string pattern, string s)
{
return Match(pattern, s, false);
}
/// <summary>
/// Returns true if the string matches the pattern which may contain * and ? wildcards.
/// </summary>
/// <param name="pattern"></param>
/// <param name="s"></param>
/// <param name="caseSensitive"></param>
/// <returns></returns>
public static bool Match(string pattern, string s, bool caseSensitive)
{
// if not concerned about case, convert both string and pattern
// to lower case for comparison
if (!caseSensitive)
{
pattern = pattern.ToLower();
s = s.ToLower();
}
// if pattern doesn't actually contain any wildcards, use simple equality
if (pattern.IndexOfAny(Wildcards) == -1)
return (s == pattern);
// otherwise do pattern matching
int i=0;
int j=0;
while (i < s.Length && j < pattern.Length && pattern[j] != '*')
{
if ((pattern[j] != s[i]) && (pattern[j] != '?'))
{
return false;
}
i++;
j++;
}
// if we have reached the end of the pattern without finding a * wildcard,
// the match must fail if the string is longer or shorter than the pattern
if (j == pattern.Length)
return s.Length == pattern.Length;
int cp=0;
int mp=0;
while (i < s.Length)
{
if (j < pattern.Length && pattern[j] == '*')
{
if ((j++)>=pattern.Length)
{
return true;
}
mp = j;
cp = i+1;
}
else if (j < pattern.Length && (pattern[j] == s[i] || pattern[j] == '?'))
{
j++;
i++;
}
else
{
j = mp;
i = cp++;
}
}
while (j < pattern.Length && pattern[j] == '*')
{
j++;
}
return j >= pattern.Length;
}
}
|
|
|
|
|
Thanks a lot. This is just what i've been looking for.
And it fades like the shadow in the night.
PhoeniX
|
|
|
|
|
Java version
We (Qn & Qg) just search and replace to procedure this java version,
public static boolean matcher(String value, String pattern) {
if (pattern == null || value == null) {
return false;
}
char[] Wildcards = new char[]{'*', '?'};
pattern = pattern.toLowerCase();
value = value.toLowerCase();
if (pattern.indexOf(Wildcards[0]) == -1 && pattern.indexOf(Wildcards[1]) == -1) {
return value.equals(pattern);
}
int i = 0;
int j = 0;
while (i < value.length() && j < pattern.length() && pattern.charAt(j) != '*') {
if (pattern.charAt(j) != value.charAt(i) && pattern.charAt(j) != '?') {
return false;
}
i++;
j++;
}
if (j == pattern.length()) {
return value.length() == pattern.length();
}
int cp = 0;
int mp = 0;
while (i < value.length()) {
if (j < pattern.length() && pattern.charAt(j) == '*') {
if ((j++) >= pattern.length()) {
return true;
}
mp = j;
cp = i + 1;
}
else if (j < pattern.length() && (pattern.charAt(j) == value.charAt(i) || pattern.charAt(j) == '?')) {
j++;
i++;
}
else {
j = mp;
i = cp++;
}
}
while (j < pattern.length() && pattern.charAt(j) == '*') {
j++;
}
return j >= pattern.length();
}
Unit test
public void testmatcher() {
System.out.println("testmatcher");
String[][] matchPaire = {
{"", ""},
{"aa", "aa"},
{"aa", "*"},
{"a", "?"},
{"sdwerporasl;df", "*"},
{"absdf zzzy", "*zzy"},
{"abc", "*?"}};
String[][] notMatchPaire = {
{"", "?"},
{"ab", "?"},
{null, null},
{"", "*a"},
{"bsadfasdfwer234", "a*"},
{"a fwer234", "*a"},
};
for (int i = 0; i < matchPaire.length; i++) {
System.out.print("paire " + matchPaire[i][0] + " " + matchPaire[i][1]);
assertTrue(ExchUtils.matcher(matchPaire[i][0], matchPaire[i][1]));
System.out.println(" ok");
}
for (int i = 0; i < notMatchPaire.length; i++) {
System.out.print("paire " + notMatchPaire[i][0] + " " + notMatchPaire[i][1]);
assertFalse(ExchUtils.matcher(notMatchPaire[i][0], notMatchPaire[i][1]));
System.out.println(" ok");
}
}
thank you all.
ktmt's member.
modified on Sunday, March 30, 2008 12:42 PM
|
|
|
|
|
Be aware that there is a bug in this C# version.
I am still working on figuring it out fully, but:
in this code segment
int cp=0;
int mp=0;
while (i < s.Length)
{
if (j < pattern.Length && pattern[j] == '*')
{
if ((j++)>=pattern.Length)
return true;
Going into the final "if" line shown here, the maximum value that j may have is (pattern.length-1), due to the first "if" test. Then we see (j++) compared. But, the value of (j++) is the value of "j" BEFORE being incremented and thus is a maximum of (pattern.length-1) and is therefore NEVER >= pattern.length. Only after the if test is completed is j actually incremented.
So the following return is never taken.
Perhaps it can be fixed by changing j++ to ++j... but I can't tell that until I complete the analysis.
On a slightly different topic, I will state my opinion as a professional programmer. This demonstrates the extremely importance of EXTENSIVE COMMENTS in code explaining NOT what the code does, but "what the code is supposed to do" in each section. If such comments were in place, this would be an easy maintenance fix. Without them, I am having to analyze what the code DOES and, from that, try to discern what the programmer INTENDED the code to do. And, I have to consider all the possible wildcard permutations just like the original programmer did. I essentially have to reinvent the wheel... because the user manual is missing.
Everyone, especially Gurus, should put extensive comments in their code on "what it is intended to do". The only downside is lack of job security, because now someone other than you can fix the code. If you have that low of opinion of your worth to your employer, and are also lacking all compassion for others, then don't comment your code.
|
|
|
|
|
I think this:
if ((j++) >= pattern.Length)
{
return true;
}
Needs to change to this:
if (++j >= pattern.Length)
{
return true;
}
Otherwise the early break does not happen and the whole string is searched.
|
|
|
|
|
most C compare functions return zero when the values are equal, but this function returns non-zero.
Personally, I find the non-zero to be more intuitive .. but after years of forcing myself to check for zero I find it a bit counter-intuitive.
I think I'll just rename the function when I add it to my library
But that certainly wont stop me from using this wonderful routine.
Many sincere thanks ...
|
|
|
|
|
David Patrick wrote:
most C compare functions return zero when the values are equal, but this function returns non-zero.
You make a good point. I probably should have made it behave like the strcmp() type functions. I'm a bit afraid to change it at this point since it has been posted for so long. It should be an easy fix for you or anyone else who is used to C style string comparisons. The C++ people here probably like the current behavior I would imagine.
-Jack
There are 10 types of people in this world, those that understand binary and those who don't.
|
|
|
|
|
I disagree. The return value for strcmp() is more than simply a test for equality, it tells you which string is greater than the other. A zero return value for strcmp() makes sense, but not for wildcmp() since the return value is strictly boolean, match or no match. The current implementation is fine (although some people might be picky about the return type, int vs bool). Perhaps to avoid confusion with string comparison functions, the function should be renamed to wildmatch() or something similar.
|
|
|
|
|
You are completely right, Vic.
It is interesting that I have renamed the function in my code to wildmatch().
It would be a good new name.
Regards, Voja
|
|
|
|
|