Click here to Skip to main content
Click here to Skip to main content

Tagged as

Go to top

A Regular Expression Wrapper Using ATL in C++

, 10 Dec 2008
Rate this:
Please Sign up or sign in to vote.
An article on a easy use regular expression wrapper.

Introduction

Regular expression is widely used in data parsing, and analyzing. For example, regular expression can be used to parse all the links from a web page.

There are many regular expression libraries in C++. What I used is CAtlRegExp provided by ATL with Microsoft Visual Studio 2005.

Attention: CAtlRegExp is defined in atlrx.h which is only included in Visual Studio 2005. However, you can also use it in Visual Studio 2008 by copying atlrx.h to C:\Program Files\Microsoft Visual Studio 9.0\VC\atlmfc\include\ or the project folder.

Background

STL vector is used as the output because it is easy to use and fast to this situation. You may take a look at STL vector first.

Using the Code

You can get Regular Expression Syntax from CAtlRegExp Class. There is only one function in my code. The function parses the source and pushes the results to a vector.

/*
 * Parameters
 *  [in] regExp: Value of type string which is the input regular expression.
 *  [in] caseSensitive: Value of type bool which indicate whether the parse is case
 *                      sensitive.
 *  [in] groupCount: Value of type int which is the group count of the regular expression.
 *  [in] source: Value of type string reference which is the source to parse.
 *  [out] result: Value of type vecotr of strings which is the output of the parse.
 *  [in] allowDuplicate: Value of type bool which indicates whether duplicate items
 *                       are added to the output result.
 *
 * Return Value
 *  Returns true if the function succeeds, or false otherwise.
 *
 * Remarks
 *  The output result is devided into groups.  User should get the groups according
 *  to the group count.  For example:
 *  1. RegExp = L"{ab}", source = L"abcabe", then result = L"ab", L"ab".
 *  2. RegExp = L"{ab}{cd}", source = L"abcdeabecd", then result = L"ab", L"cd", L"ab",
 *              L"cd".
*/
bool ParseRegExp(const wstring ®Exp,
                 bool caseSensitive,
                 int groupCount,
                 const wstring &source,
                 vector<wstring> &result,
                 bool allowDuplicate = false);

I think the comments have explained the usage clearly so let's go to some examples.

  1. Get product name from string product: Bowling ball; price: $199;
wstring source = L"product: Bowling ball; price: $199; ";
wstring regExp = L"product: {.*?};";
vector<wstring> result;
if (ParseRegExp(regExp, false, 1, source, result)
    && result.size() > 0)
{
    wprintf(L"products name: %s\n", result[0].c_str());
}

Pretty simple, right?

  1. Let's see a complex one. Sometimes, we needs to parse the select in a web page. The HTML code is as follows.
<select name="imagesize" style="margin:2px 0" onchange="_isr_load(this)">
    <option value="/images?q=test&imgsz=" selected>All image sizes</option>
    <option value="/images?q=test&imgsz=huge" >Extra Large images</option>
    <option value="/images?q=test&imgsz=xxlarge" >Large images</option>
    <option value="/images?q=test&imgsz=small|medium|large|xlarge" >
        Medium images</option>

    <option value="/images?q=test&imgsz=icon" >Small images</option>
</select>

The source code is as follows.

wstring source = ¡­;
wstring regExp = L"<select.*?>{.*?}</select>";
vector<wstring> optionsAllResult;
if (ParseRegExp(regExp, false, 1, source, optionsAllResult, false)
    && optionsAllResult.size() == 1)
{
    regExp = L"<option value=\"{.*?}\".*?>[\r\t\n ]*{.*?}[\r\t\n ]*</option>";
    vector<wstring> optionsResult;
    if (ParseRegExp(regExp, false, 2, optionsAllResult[0], optionsResult)
        && optionsResult.size() > 0
        && optionsResult.size() % 2 == 0)
    {
        for (vector<wstring>::size_type index = 0; index < optionsResult.size(); index += 2)
        {
            wprintf(L"Option: %s\n", optionsResult[index + 1].c_str());
            wprintf(L"Value: %s\n", optionsResult[index].c_str());
            wprintf(L"\n");
        }
    }
}

The output is:
    Option: All image sizes
    Value: /images?q=test&imgsz=

    Option: Extra Large images
    Value: /images?q=test&imgsz=huge

    Option: Large images
    Value: /images?q=test&imgsz=xxlarge

    Option: Medium images
    Value: /images?q=test&imgsz=small|medium|large|xlarge

    Option: Small images
    Value: /images?q=test&imgsz=icon

Points of Interest

I set the warning level of the compiler to Level 4 and true on the option "Treat warning as error." And it really helps me.

History

Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

shicheng
Software Developer Philips
China China
Cheng Shi is a software developer in China. He is interested in COM, ATL, Direct3D, etc. He is now working for Philips.
 
Cheng Shi loves Formula1 and watchs every Grand Prix. He is dreaming to be a racing car driver. Hope his dream can come true.

Comments and Discussions

 
Questionfatal error Pinmemberranjithkumar8125-Feb-14 23:30 
GeneralRegex Pinmembergeoyar15-Dec-08 15:06 
Generalatlrx.h on CodePlex Pinmemberyarp10-Dec-08 19:42 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140921.1 | Last Updated 10 Dec 2008
Article Copyright 2008 by shicheng
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid