 |
|
|
 |
|
 |
My interest in ATL was shortlived as I've noticed Microsoft no longer include the ATL Server library (except for a few data encoding/decoding classes) in VC++ 2008. Unfortunately, CAtlRegExp was not one of the few classes they kept.
Microsoft no longer maintains or ships ATL Server with VC++ and has released it as a shared source at Codeplex[^]
Visual C++ 2008 ATL Breaking changes[^]
Just thought I'd pass this on.
Either way, thanks again for posting your article as it helped me navigate through all the details in a short time.
|
|
|
|
 |
|
 |
Instead of ATL you can use the new MFC classes as described in http://www.developer.com/net/cplus/article.php/3746091:
#include <regex>
#include <iostream>
#include <string>
basic_regex<char> regex("[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}",
basic_regex<char>::icase);
cout<< std::boolalpha << regex_match("GoOD@DOMAIN.COM", regex)
<< std::endl;
cout<< std::boolalpha << regex_match("@DOMAIN.COM", regex)
<< std::endl;
Good luck!
M. Knaup
|
|
|
|
 |
|
 |
Just a minor FYI.
There is an unhandled exception "Access Violation" from within the ATL code if you call Match with an empty string.
Your sample will let this happen if the input string to search is zero length. Just thought I'd pass this on.
Otherwise,
Thanks for the article.
|
|
|
|
 |
|
 |
ptrdiff_t nLength = szEnd - szStart;
I don’t understand this string.
|
|
|
|
 |
|
 |
As they are both pointers to RECHAR's, I think you take them relative to the start of the original string and cast them to integers.
Haven't tried that yet though, but I think I will shortly.
Why not produce integers right away so they can be used to index into the original string? Microsoft seems to like it the hard way....
|
|
|
|
 |
|
 |
What I did:
CString strText = ".....";
//....
//....
int nStart = (int)(szStart - (LPCTSTR)strText)l;
int nEnd = (int)(szEnd - (LPCTSTR)strText)l - 1;
// this renders nStart and nEnd relative to first character of string
And it works...
|
|
|
|
 |
|
 |
I down the code, but find some unexpected bug, as follow:
1. The regExp that is "\d{4}-\d{2}-\d{2}" cann't match the string(2006-12-30).
2. The RegExp that is "http://sports\.sina\.com\.cn/\w/\d{4}-\d{2}-\d{2}/\d+.shtml" cann't match the string(http://sports.sina.com.cn/g/2007-01-04/05012672491.shtml).
Can you tell me the reason, please?
Do My Best
|
|
|
|
 |
|
 |
I think the author mentioned that the atl regular expression does not have the {n} (match exactly n times) implemented.
|
|
|
|
 |
|
 |
Nice article, but here's the kicker:
Is it possible to perform a case-insensitive search using an option (similarly to the perl syntax: /regex/i)?
|
|
|
|
 |
|
 |
In fact, its the perl's job (and not regex) to split the string with "/", put everything between "/" as the content of the regex pattern, and get the options after the last "/".
Sam
|
|
|
|
 |
|
 |
Is everybody else getting warning C4018 in atlrx.h?
Julberto
|
|
|
|
 |
|
 |
Yes, me too
|
|
|
|
 |
|
 |
For the time being I'm using this to supress the warning.
#pragma warning(disable: 4018) #include <atlrx.h>
#pragma warning(default: 4018)
I suppose the Microsoft guys writing ATL code are being a bit careless. By the way Sam, this was a nice little article.
Julberto
|
|
|
|
 |
|
 |
I found out that the matches can only be retrieved if you use groups in your regular expression. For example use {abc} instead of abc, else the matches will not be displayed. Unlike that the boolean indicator regex.Match() does not require groups.
|
|
|
|
 |
|
 |
Perl also works in this way, and I think this is correct. Chances are that we just want part of the matched substring, and not the whole string.
## perl code
my $s = "I am a boy and I like running";
if ( $s =~ /I am a (\w+) and I like \w+/ ) {
print "You are a [$1] and you like [$2]";
} else {
print "Not match!\n";
}
C:\>perl test.pl
You are a [boy] and you like []
|
|
|
|
 |
|
 |
I am afraid to say that both of you are mistaked. Sam's simple approach really works well. not only Group search but also non-group search works. Sam simply didn't display non-group search result. His displaying only displays if the reg-exp is grouped.
|
|
|
|
 |
|
 |
A very annoying feature indeed. I am currently writing a regex find/replace dialog for end users with a very limited understanding of regular expressions. They are more likely to use simple patterns like the beginning or ending characters of a text, rather than entering complex group patterns.
This current behaviour makes it unfit for the application I am developing.
|
|
|
|
 |
|
 |
This Regex implemenation has a serious bug that Microsoft has never corrected: the OR operator does not work properly. An example:
Given the simple pattern:
"abc|123"
Then the strings
"abc" or "123" should be matched... but they are not.
A Regex without a proper OR operator looses much interest...
|
|
|
|
 |
|
|
 |
|
 |
This is right, but this is just a work around the bug. This obliges you to define two sub-groups {...} while you may not desire this.
|
|
|
|
 |
|
 |
It's not the workaround; it's a rule.
When you match 123|abc, you tell to find 1, then 2, then 3 or a, then b, and then c. It's a matter of precedence.
Hence, to match 123 or abc, you must type {123}|{abc}.
http://www.silveragesoftware.com/hffr.html HandyFile Find And Replace
|
|
|
|
 |
|
 |
You are right. I can see this is how it works with CAtlRegExp. But on one hand, this is not documented, and on another hand, this is something specific to CAtlRegExp. Other Regex implementations like POSIX behave differently.
BTW, (123)|(abc) is then more appropriate than {123}|{abc} with CAtlRegExp
|
|
|
|
 |
|
 |
phgo wrote: this is something specific to CAtlRegExp
Yes, it is. But it's quite reasonable, to my mind
phgo wrote: BTW, (123)|(abc) is then more appropriate than {123}|{abc} with CAtlRegExp
Ah, yes... {} remembers the matched text with CAtlRegExp.
http://www.silveragesoftware.com/hffr.html HandyFile Find And Replace
|
|
|
|
 |
|
 |
If you make it {(123)|(abc)}, then I agree with you. I want to explain why {123}|{abc} is not a good approach
Consider the following code snippet:
int _tmain(int argc, _TCHAR* argv[])
{
CAtlRegExp<> RE;
CAtlREMatchContext<> RM;
char * num = new char[64];
const wchar_t * R = L"{123}|{abc}";
const wchar_t * S = L"abc";
RE.Parse(R);
if (RE.Match(S,&RM)) {
for (UINT i = 0; i < RM.m_uNumGroups;i++) {
_itoa(i,num,10);
const CAtlREMatchContext<>::RECHAR* szStart = 0;
const CAtlREMatchContext<>::RECHAR* szEnd = 0;
RM.GetMatch(i,&szStart,&szEnd);
printf("Match group: ");printf(num);printf("\n");
if (szStart == szEnd) printf("Empty match group\n"); else
for (CAtlREMatchContext<>::RECHAR* j = const_cast::RECHAR*>(szStart);j < szEnd;j++) {
fputchar(*j);
}
printf("\n");
}
} else printf("no match\n");
getchar();
delete [] num;
return 0;
}
Output produced by this code:
Match group: 0
Empty match Group
match group 1:
abc
The CAtlRexExp treats the results as two groups, one of them being empty. We are looking for just one group however, so this is not the way to make an or decision, unless you are willing to assert one or more groups being empty.
|
|
|
|
 |