Introduction
Regular expressions provide a convenient way to specify complicated string pattern for search, replace or validate the text input. Since it is very useful, many people wrote their own library. Many libraries I found are buggy and it takes a lot of time to debug source code. However, actually you do not need to search any more, since there is already one built in your computer for free � the regular expression parser written by Microsoft.
As many other utilities, Microsoft provides this functionality using COM interface. It is easy to find this COM server object, named Microsoft VBScript Regular Expressions 5.5, with the tool oleview:
One problem is that there is no type library associated with this DLL. Fortunately it is not a big deal since we have the IDL definition. From OleView, save the IDL definition into a file and use MIDL to compile it, you will get type library. After that, we can use this type library within our C++ program. Suppose we have this file named RegExp.tlb.
Using regular expression
You can find all the documentation in either MSDN or the URL http://msdn.microsoft.com/scripting/default.htm?/scripting/vbscript/doc/vsobjregexp.htm. Though it is for scripting, you can still use them directly with the help of the newly-added keyword #import since Visual C++ 6. Generally you define a pattern, then you can test this pattern against the input string. or exacute to see whether they are any matches.
To demonstrate its usage, we wrote a custome DDX routine to verify the input of a control in a dialog. The function prototype is listed as following:
void WINAPI DDX_RegExp(CDataExchange* pDX, int nIDC, LPCTSTR lpszPattern, CString& value);
If the control input exactly matches the specified pattern(lpszPattern), the validation is passed otherwise an message box will pop up.
#import "RegExp.tlb" no_namespace
...
void AFXAPI DDX_RegExp(CDataExchange* pDX, int nIDC, LPCTSTR lpszPattern, CString& value)
{
try {
static IRegExpPtr regExp( __uuidof(RegExp) );
regExp->Pattern = _bstr_t(lpszPattern);
HWND hWndCtrl = pDX->PrepareEditCtrl(nIDC);
if (pDX->m_bSaveAndValidate)
{
int nLen = ::GetWindowTextLength(hWndCtrl);
::GetWindowText(hWndCtrl, value.GetBufferSetLength(nLen),nLen+1);
value.ReleaseBuffer();
if ( regExp->Test( (LPCTSTR)value) )
{
IMatchCollectionPtr matches=regExp->Execute((LPCTSTR)value);
if ( matches->Count== 1)
{
IMatchPtr match = matches->Item[0];
if ( match->FirstIndex==0 && match->Length == value.GetLength() )
{
return;
}
}
}
CString strMsg = CString("The input does not exactly have the pattern ") + lpszPattern;
pDX->m_pDlgWnd->MessageBox(strMsg);
pDX->PrepareEditCtrl(nIDC);
pDX->Fail();
}
else
{
}
}
catch (_com_error& e)
{
AfxMessageBox( e.ErrorMessage() );
}
}
In the code above, we first use Test method to see whether there is a match. If there is one, we use Execute method to retrieve all the matches. There should be only one match.
After we define that, we can use this function in our MFC application. Note that you must initialize the COM library first in your application. The following will validate an input box to see if it matches the phone number format:
DDX_RegExp(pDX, IDC_INPUT, _T("\\d{3}-\\d{3}-\\d{4}"), m_strInput);
In this way, you can write validation code for more complicated pattern. The COM makes things a lot easier.
|
|
 |
 | SubMatches? vikee | 6:41 7 Sep '09 |
|
 |
Hi,
after long search and trying still can't figure out how to retrieve groups. If I use IMatch2Ptr then SubMatches are accessible but the result is null.
IMatch2Ptr match = matches->Item[0]; match->SubMatches or match->GetSubMatches();
Could you please add an example of groups to your tutorial?
Thanks in advance. Regards, Viktor
|
|
|
|
 |
|
 |
Hi,
and to make the struggle shorter, I finally figured out: you need to use version 2 of these classes, starting with IRegExp2Ptr.
Viktor
|
|
|
|
 |
 | Good work... benny_thomas03 | 4:04 8 Jun '08 |
|
 |
Not a bad idea at all! I'm using boost regex library in my projects. But reusing regex library from vbscript is also a good idea for windows platform as long as we manage to avoid the 'dll-hell'
|
|
|
|
 |
 | Compiling the IDL? biopsy | 17:27 21 Aug '07 |
|
 |
How do you compile the IDL File? Do you just drag and drop it into midl.exe?
|
|
|
|
 |
 | VC++ Project? vito333 | 4:41 2 Jun '05 |
|
 |
What about simple VC++ project? It will be great ...
|
|
|
|
 |
 | Distribution queston Anonymous | 12:44 14 May '05 |
|
 |
I created an dll application in visual c++ 6.0 that uses vbscript regular expression component following the steps here. I use that dynamically link that dll for my exe application and it works great. I was wondering if I wanted to distribute this dll to a friend, would I give him the dll and the RegExp.tlb? Thanks
|
|
|
|
 |
 | User Brakpoint Anonymous | 10:45 9 Nov '04 |
|
 |
Getting a User Breakpoint called from code at 0x77f813b1 on some valid expressions. For example: .+\\(.*0.*\\).*=.*\"\" does this for me. Any ideas ?
|
|
|
|
 |
 | Getting the headers for RegExp 1.0 Vince Gatto | 10:04 16 Sep '04 |
|
 |
Just as an FYI, if you want to be as backwards compatible as possible, you should use RegExp 1.0. This was shipped with Internet Explorer 3.0, so these days its everywhere.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/vtoriversioninformation.asp[^]
I couldn't seem to get the type library viewer to extract the old type library, so I just used VC++ to generate the .tlh file using
#import "vbscript.dll" tlbid(2)
I usually don't include import statements in production code, instead I modify the generated .tlh (to remove the wrappers) and get rid of the .tli completely, and then add the .tlh to source control.
|
|
|
|
 |
|
 |
Great info! Unfortunately, I'm a noobie at COM so I'm not sure how to go about getting a valid handle to th COM object to then call the functions. Do you have a quick example?
|
|
|
|
 |
|
 |
Once you have the tlh file, make sure to include it:
#include "vbscript_regexp.tlh" // I renamed this to clear up the ambiguity
And to make things look slightly cleaner you can use a namespace so you don't need to fully qualify use of the definitions in the header:
using namespace VBScript_RegExp_10;
If you're using COM at all, you should be using ATL. If you don't know much about ATL, I'm sure you can find some good articles on here. It's basically a set of light weight template wrappers that help you use and write COM components. Here's a little code snippet that uses ATL and the RegExp object:
CComPtr pRegExp;
HRESULT hr = pRegExp.CoCreateInstance(__uuidof(RegExp), NULL, CLSCTX_INPROC_SERVER);
CComBSTR bstrPattern(_T("[\\S]+[.]tmp")); hr = pRegExp->put_Pattern(bstrPattern);
CComPtr will take care of AddRef and Release for us, so we don't need to worry about it. I also used ATL's CComBSTR to avoid having to call SysAllocString and SysFreeString when using BSTRs.
Hopefully that makes sense.
|
|
|
|
 |
|
|
Last Updated 26 Jan 2001 |
Advertise |
Privacy |
Terms of Use |
Copyright ©
CodeProject, 1999-2010