Click here to Skip to main content
Click here to Skip to main content

Use regular expression in your C++ program

By , 25 Jan 2001
 
<!-- Add the rest of your HTML here -->

Introduction

Regular expression provides a convenient way to specify complicated string pattern for search, replace or validate the text input. Since it is very useful, many people wrote their own library. Many libraries I found are buggy and it takes a lot of time to debug source code. However, actually you do not need to search any more, since there is already one built in your computer for free – the regular expression parser written by Microsoft.

As many other utilities, Microsoft provides this functionality using COM interface. It is easy to find this COM server object, named Microsoft VBScript Regular Expressions 5.5, with the tool oleview:

One problem is that there is no type library associated with this DLL. Fortunately it is not a big deal since we have the IDL definition. From OleView, save the IDL definition into a file and use MIDL to compile it, you will get type library. After that, we can use this type library within our C++ program. Suppose we have this file named RegExp.tlb.

use regular expression

You can find all the documentation in either MSDN or the URL http://msdn.microsoft.com/scripting/default.htm?/scripting/vbscript/doc/vsobjregexp.htm. Though it is for scripting, you can still use them directly with the help of the newly-added keyword #import since Visual C++ 6. Generally you define a pattern, then you can test this pattern against the input string. or exacute to see whether they are any matches.

To demonstrate its usage, we wrote a custome DDX routine to verify the input of a control in a dialog. The function prototype is listed as following:

void WINAPI DDX_RegExp(CDataExchange* pDX, int nIDC, LPCTSTR lpszPattern, CString& value);

If the control input exactly matches the specified pattern(lpszPattern), the validation is passed otherwise an message box will pop up.

#import "RegExp.tlb" no_namespace
...
void AFXAPI DDX_RegExp(CDataExchange* pDX, int nIDC, LPCTSTR lpszPattern, CString& value)
{
	try {
	static IRegExpPtr regExp( __uuidof(RegExp) );
	regExp->Pattern = _bstr_t(lpszPattern);
	
     HWND hWndCtrl = pDX->PrepareEditCtrl(nIDC);
      if (pDX->m_bSaveAndValidate)
      {
            int nLen = ::GetWindowTextLength(hWndCtrl);
            ::GetWindowText(hWndCtrl, value.GetBufferSetLength(nLen),nLen+1);
            value.ReleaseBuffer();

			//now we verify it
			if ( regExp->Test( (LPCTSTR)value) )
			{
				IMatchCollectionPtr matches=regExp->Execute((LPCTSTR)value);
				if ( matches->Count== 1)
				{
					IMatchPtr match = matches->Item[0];
					if ( match->FirstIndex==0 && match->Length == value.GetLength() )
					{
						return;
					}
				}
			}
			CString strMsg = CString("The input does not exactly have the pattern ") + lpszPattern;
			pDX->m_pDlgWnd->MessageBox(strMsg);
			pDX->PrepareEditCtrl(nIDC);
			pDX->Fail();
      }
      else
      {
      }
	}
	catch (_com_error& e)
	{
		AfxMessageBox( e.ErrorMessage() );
	}
}

In the code above, we first use Test method to see whether there is a match. If there is one, we use Execute method to retrieve all the matches. There should be only one match.

After we define that, we can use this function in our MFC application. Note that you must initialize the COM library first in your application. The following will validate an input box to see if it matches the phone number format:

DDX_RegExp(pDX, IDC_INPUT, _T("\\d{3}-\\d{3}-\\d{4}"), m_strInput);

In this way, you can write validation code for more complicated pattern. The COM makes things a lot easier.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Sherwood Hu

United States United States
No Biography provided

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralIRegExpPtr : undeclared identifiermemberajitatif angajetor26-Jan-09 0:40 
The title is the error message I receive in compile-time. Here's what I've done, please tell me what is wrong/missing
 
- Opened vbscript.dll in VS and got the second (#2) typelib exported as "vbsRegEx.tlb"
- Added .tlb file to project (using "add existing item")
- Created custom build step to copy the tlb file into the Debug (or Release, according to build configuration) directory
- Opened the "oleview", found "Microsoft VBScript Regular Expressions 5.5" and got IDL definition
- Saved the IDL definition to file "vbsRegEx.idl"
- Added #import "Debug\vbsRegEx.tlb" no_namespace line at the top of the .cpp file. (The build configuration is Debug, so the path is correct, it does not give "not found" error)
 
What is missing?
GeneralRe: IRegExpPtr : undeclared identifier [modified]memberAndreBroz14-Jul-09 2:28 
I use just #import <vbscript.dll> tlbid(2) no_namespace to work with VBScript_RegExp_10 or #import <vbscript.dll> tlbid(3) no_namespace to work with VBScript_RegExp_55 instead of all the steps you described
 
modified on Tuesday, July 14, 2009 8:43 AM

GeneralRe: IRegExpPtr : undeclared identifiermemberineox24-Mar-10 2:40 
Nice example http://cplusplus.net.ru/usage-vbscript-regexp-cpp/[^]
Neo

GeneralGlobal p arameter won't work (for me)memberarkyc6-May-03 17:37 
Hi All,
 
I'm having problems with the Global parameter for the Replace method. Whatever I do, only the first instance
of the text I want to replace is replaced.
 
My code is below. The return value of the "replacetext" method is:
"hi there stupid person. you are funny another test" and not "hi there stupid person. you are stupid another test"
as I expected. Any help is appreciated. I thought setting the global parameter to true would replace
all instances of "funny" with "stupid". Am i misunderstanding something? Cry | :((
 
I am using the type library generated as a result of opening up vbscript as a resource and
expanding the "TYPELIB" branch, right clicking on the second typelib resource, select Export and then saving
the type library as RegExp.tlb.
 
It also doesn't work if i generate wrapper classes from the type library either instead of using the #import
method.
 
Any help is appreciated. Wink | ;)
 

#import "regexp.tlb" named_guids no_namespace
 

CString replacetext(CString p_sText, CString p_sTextToReplace, CString p_sReplacementText, bool p_bReplaceAll)
{

CString l_strResult;

try
{
 
IRegExpPtr regExp( __uuidof(RegExp) );
 
regExp->put_Pattern(_bstr_t(p_sTextToReplace));
 
if (p_bReplaceAll)
{
regExp->put_Global((VARIANT_BOOL)TRUE);
}
 
_bstr_t l_bstrReplace(p_sReplacementText);
_bstr_t l_bstrSource(p_sText);
 
BSTR l_bstrResult;
 
l_bstrResult=l_strResult.AllocSysString();
 

// regExp->Replace(l_bstrSource, l_bstrReplace, &l_bstrResult);
 
l_bstrResult = regExp->Replace(l_bstrSource, l_bstrReplace);
 
_bstr_t l_bstrResult2(l_bstrResult);
 


l_strResult=(TCHAR*)l_bstrResult2;

 

}
catch (_com_error& e)
{
AfxMessageBox( e.ErrorMessage() );
}
 

return(l_strResult);
 
}
 

void CTesterDlg::OnButton1()
{
 
CString l_strResult=replacetext("hi there funny person. you are funny another test", "funny", "stupid", true);
}

GeneralRe: Global p arameter won't work (for me)memberarkyc6-May-03 17:43 
Oh yeah.. in case you were wondering... Yes, COM is initialized. It's just not shown in the example posted
 
regards
 
ArkyC
GeneralRe: Global p arameter won't work (for me)memberAndreBroz14-Jul-09 2:18 
I ran into the same problem. This is because TRUE is defined as 1, and VARIANT_TRUE as -1. You should use VARIANT_TRUE instead of TRUE. Otherwise the property is not set.
QuestionHow can i know all the members of these class! What form they are?memberDavidlou13-Mar-02 7:18 
How can i know all the members of these class! What form they are?
I can't know why my matches.Count is only 1(impossible, there are many in fact! //Sigh!)
 
Thanks!
Frown | :(
 
Sincerely, Davidlou
davidlou68@hotmail.com
QuestionHow to Uninitialize ObjectmemberNIrving18-Nov-01 0:01 
Hi all,
When I try to Uninitialize the object it core dumps. Does anybody know how I can unreference the reference I have created, as I would like to use this without crashing the app.
 
==n
AnswerRe: How to Uninitialize ObjectmemberPhilip Patrick31-Jan-02 7:57 
If the object is like ***Ptr (IMatchStringPtr for example), you don't have and do not need to release it, it will be done automatically... All other objects, release as usual, e.g.: InterfacePointer->Release()
 
Philip Patrick
"Two beer or not two beer?" (Shakesbeer)
Web-site: www.saintopatrick.com
AnswerRe: How to Uninitialize ObjectmemberMatt Fitzgerald14-Feb-02 13:16 
I used ATL's smart pointer class CComPtr<IRegExp> to handle the interface referencing for me.   In fact I wrote a wrapper class which looks something like (very simplified!):
 
class CRegExp
{
public:
   CRegExp() { m_pRegExp.CoCreateInstance(__uuidof(RegExp)); }
 
private:
   CComPtr<IRegExp> m_pRegExp;
};
 
Then within the methods of my class I just use the internal smart pointer like any other COM interface pointer.
 
Of course you still have to import the type library and do your COM initialization, error handling etc.
 
Hope that helps!
GeneralregexmemberTomaz Stih4-Mar-01 23:45 
For usage within your C++ code (to avoid COM) one might also use the regex derived C++ library http://www.boost.org/libs/regex/index.htm.
 
Tomaz
 
Tomaz Stih, B.Sc.SE
GeneralRe: regexmemberlorenzo4-Jan-02 3:27 
I tried this library while writing an ISAPI Filter for IIS-URL-REWRITING but I've got memory leaks.
Confused | :confused:
 
lrx
GeneralAbout the type library...sussPatrick Dell'Era21-Oct-00 11:00 
Of course there is a type library associated with the RegExp class. If there wasn't, OleView could not present a midl script for it.
 
The type library is housed in %SYSTEM%\vbscript.dll. Unfortunately, vbscript.dll can not be #imported directly to get the RegExp definitions. The problem is that the #import function reads the first type library found in the specified file, and RegExp's type library is actually the second one in vbscript.dll's resources.
 
To retrieve the original type library, open vbscript.dll as a resource from within DevStudio. Expand the "TYPELIB" branch, right click on the second typelib resource, select Export, then save the type library as RegExp.tlb.
 
This eliminates any chance for loss of fidelity due to decompile/recompile cycles
GeneralRe: About the type library... [modified]memberAndreBroz13-Jul-09 21:42 
As far as I understand in order to use VBScript_RegExp_10 you should use the 2nd type library, for VBScript_RegExp_55 you shold use the 3d type library. The 1st type library is for VBScript_Global.
Some posts suggest using #import <vbscript.dll> tlbid(2) to select a type library from a dll.
 
modified on Tuesday, July 14, 2009 8:48 AM

GeneralSome detailssussRoberto Guerzoni17-Oct-00 5:50 
In order to use the published code:
1- #include to use USES_CONVERSION
2- specify no_namespace with #import
#import "RegExp.tlb" no_namespace
3-remove static from IRegExpPtr regExp( __uuidof( RegExp ) );
 

Let me know if i'm wron
GeneralRe: Some detailssussSherwood Hu17-Oct-00 6:51 
You are right. 1,2 are necessary to make the sample work. For 3, you can declare it as static since this function will be called a couple of times and it takes some time to invoke this COM object
GeneralThis macro is no longer neededsussSherwood Hu17-Oct-00 7:46 
Thank you for your comment. I reviewed the code again and found that I made a mistake. The macro is not needed any more. This code is retrieved from one project and I removed some code of dealing with string conversion. The code above does not need any string conversion so the header file, atlconv.h is not necessary
GeneralRe: This macro is no longer neededsussRoberto Guerzoni18-Oct-00 2:06 
In my test program the use of static generate an access violation when the program quitting.
The offending location is the reference count relative RegExp object.
GeneralRe: This macro is no longer neededsussOtis B18-Oct-00 5:51 
Static function variables are destroyed on program exit. This will be after MFC has shut down COM.
On destruction the com_ptr_t will call Release() on its contained Interface pointer
 
The same problem will occur if you use a global com_ptr_t
GeneralThe solutionsussSherwood Hu18-Oct-00 6:50 
I declare the smart pointer as static because I want the code looks better. To get around this problem, try to add the following lines in the file stdafx.cpp:
 
#pragma init_seg(lib)
struct ComInitializer
{
ComInitializer()
{
::CoInitialize(NULL);
}
~ComInitializer()
{
::CoUninitialize();
}
 
} the_com_initializer;
 
and remove your CoInitialize code. This will gurrantee that COM library will be initialized before any static objects are contructed and released after any static objects go out of scope.
GeneralProblem with USES_CONVERSIONsussRoberto Guerzoni16-Oct-00 22:18 
I can't find any reference to USES_CONVERSION macro. Where did you get it
GeneralRe: Problem with USES_CONVERSIONsussSimon Capewell16-Oct-00 23:41 
It's an ATL thing. Look in atlconv.h
GeneralClever IdeasussUwe Keim16-Oct-00 20:01 
But you must have the Windows Scripting Engines (including JScript and VBScript) version 5 or higher installed.
 
They can be downloaded from msdn.microsoft.com/scripting
GeneralRe: Clever IdeasussWilliam E. Kempf17-Oct-00 7:02 
Useful COM object, but the functionality is limited and not everyone cares to use COM.
 
For a more portable solution in C++ (i.e. not COM) with a lot more functionality you should really check out Boost Regexp (formerly Regexp++) found at http://www.boost.org
GeneralRe: Clever IdeasussUwe Keim17-Oct-00 20:26 
I examined the RegExp++ library, too.
 
But as the author of this article mentioned correctly, it is not complete.
 
Doing a lot of Perl, I really miss things like lookahead (?= ) or lookbehing (?<= ).
 
The Microsoft RegExp object is one that supports most of the Perl-like syntax.
 
In addition, the PCRE (Perl Compatible Regular Expression libary) supports exactly the same as Perl 5.005, also being somewhat difficult to use.
GeneralRe: Clever IdeamemberA. Reskala11-Jul-01 20:08 
For a more portable solution in C++ (i.e. not COM) ??????
 
COM is a binary standard, and as a standard it is fully portable.
GeneralRe: Clever IdeamemberTodd Smith12-Jul-01 5:10 
heh

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130617.1 | Last Updated 26 Jan 2001
Article Copyright 2000 by Sherwood Hu
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid