A Regular Expression Wrapper Using ATL in C++
An article on a easy use regular expression wrapper.
Introduction
Regular expression is widely used in data parsing, and analyzing. For example, regular expression can be used to parse all the links from a web page.
There are many regular expression libraries in C++. What I used is CAtlRegExp provided by ATL with Microsoft Visual Studio 2005.
Attention: CAtlRegExp is defined in atlrx.h which is only included in Visual Studio 2005. However, you can also use it in Visual Studio 2008 by copying atlrx.h to C:\Program Files\Microsoft Visual Studio 9.0\VC\atlmfc\include\ or the project folder.
Background
STL vector is used as the output because it is easy to use and fast to this situation. You may take a look at STL vector first.
Using the Code
You can get Regular Expression Syntax from CAtlRegExp Class. There is only one function in my code. The function parses the source and pushes the results to a vector.
/*
* Parameters
* [in] regExp: Value of type string which is the input regular expression.
* [in] caseSensitive: Value of type bool which indicate whether the parse is case
* sensitive.
* [in] groupCount: Value of type int which is the group count of the regular expression.
* [in] source: Value of type string reference which is the source to parse.
* [out] result: Value of type vecotr of strings which is the output of the parse.
* [in] allowDuplicate: Value of type bool which indicates whether duplicate items
* are added to the output result.
*
* Return Value
* Returns true if the function succeeds, or false otherwise.
*
* Remarks
* The output result is devided into groups. User should get the groups according
* to the group count. For example:
* 1. RegExp = L"{ab}", source = L"abcabe", then result = L"ab", L"ab".
* 2. RegExp = L"{ab}{cd}", source = L"abcdeabecd", then result = L"ab", L"cd", L"ab",
* L"cd".
*/
bool ParseRegExp(const wstring ®Exp,
bool caseSensitive,
int groupCount,
const wstring &source,
vector<wstring> &result,
bool allowDuplicate = false);
I think the comments have explained the usage clearly so let's go to some examples.
- Get product name from string product: Bowling ball; price: $199;
wstring source = L"product: Bowling ball; price: $199; ";
wstring regExp = L"product: {.*?};";
vector<wstring> result;
if (ParseRegExp(regExp, false, 1, source, result)
&& result.size() > 0)
{
wprintf(L"products name: %s\n", result[0].c_str());
}
Pretty simple, right?
- Let's see a complex one. Sometimes, we needs to parse the select in a web page. The HTML code is as follows.
<select name="imagesize" style="margin:2px 0" onchange="_isr_load(this)">
<option value="/images?q=test&imgsz=" selected>All image sizes</option>
<option value="/images?q=test&imgsz=huge" >Extra Large images</option>
<option value="/images?q=test&imgsz=xxlarge" >Large images</option>
<option value="/images?q=test&imgsz=small|medium|large|xlarge" >
Medium images</option>
<option value="/images?q=test&imgsz=icon" >Small images</option>
</select>
The source code is as follows.
wstring source = ¡;
wstring regExp = L"<select.*?>{.*?}</select>";
vector<wstring> optionsAllResult;
if (ParseRegExp(regExp, false, 1, source, optionsAllResult, false)
&& optionsAllResult.size() == 1)
{
regExp = L"<option value=\"{.*?}\".*?>[\r\t\n ]*{.*?}[\r\t\n ]*</option>";
vector<wstring> optionsResult;
if (ParseRegExp(regExp, false, 2, optionsAllResult[0], optionsResult)
&& optionsResult.size() > 0
&& optionsResult.size() % 2 == 0)
{
for (vector<wstring>::size_type index = 0; index < optionsResult.size(); index += 2)
{
wprintf(L"Option: %s\n", optionsResult[index + 1].c_str());
wprintf(L"Value: %s\n", optionsResult[index].c_str());
wprintf(L"\n");
}
}
}
The output is:
Option: All image sizes
Value: /images?q=test&imgsz=
Option: Extra Large images
Value: /images?q=test&imgsz=huge
Option: Large images
Value: /images?q=test&imgsz=xxlarge
Option: Medium images
Value: /images?q=test&imgsz=small|medium|large|xlarge
Option: Small images
Value: /images?q=test&imgsz=icon
Points of Interest
I set the warning level of the compiler to Level 4 and true on the option "Treat warning as error." And it really helps me.
History
Initial version.