Click here to Skip to main content
15,887,676 members
Articles / Desktop Programming / MFC

DEELX - Regular Expression Engine for C++

Rate me:
Please Sign up or sign in to vote.
4.69/5 (22 votes)
25 Dec 2006CPOL3 min read 177.4K   4.3K   66   66
DEELX regular expression engine is the most convenient and easiest engine to use.

Downloads for C++

Download Unit for Delphi (statically linked into Delphi project)

Download ActiveX for VB

Download Dynamic Link Version

Introduction

DEELX is a simple regular expression engine coded in pure C++.

All source code of DEELX is just only one single header file (deelx.h). Without any other CPP or lib, you need not create a project alone for DEELX when you want to use it, and also you need not worry about link problems.

DEELX has a good compatibility that it can be compiled by Visual C++ 6.0, 7.1, 8.0 (Windows), gcc(Cygwin), gcc(Linux), gcc(FreeBSD), Turbo C++ 3.0(DOS), C++ Builder(Windows), etc. DEELX is coded using template, so char, wchar_t and other simple types can be used as its base type.

DEELX regular expression engine is the most convenient and easiest engine to use.

Features

DEELX supports PERL compatible regular expression syntax. Besides the basic pattern syntax, DEELX has implemented many extended syntaxes:

  • Right to left match mode
  • Named capture group
  • Remark
  • Zero-width assertion
  • Independent expression
  • Conditional expression
  • Recursive expression
  • Replace operation

Ideas

The most important idea of DEELX is the concept of "Element of Regular Expression". In the source code, I call it "ELX".

I regard every kind of element as "Abstract Element" => "ElxInterface". This ElxInterface has two methods: Match() and MatchNext(). Match() means to try to match the first time. If Match() returns true, but what matched is not what you want, call MatchNext() means to discard the result and try to get another successful match. If the result is still not what you want, go on calling MatchNext() till it returns false or you get what you want.

For example, two elements: (.*)(a)

  1. To call the "Match()" method of the first element(.*) will let it match all the text. But now the second element(a) will fail to match, so the match result of the previous "Match()" is not what I want.
  2. The next step is to call the "MatchNext()" method of the first element(.*). This step is also called "backtrack". The first element(.*) will reduce its repeat times, then the second element(a) will again try to match.
  3. So on, one possible final result is that: even the first element(.*) reduced to zero times, the second element still failed to match, so the overall regular expression failed to match.
  4. Another final result is that: when the first element(.*) reduced to a certain times, the second element succeeded to match, so the overall regular expression succeeded.

Match operations of all kinds of elements can be abstracted into "Match()" and "MatchNext()" operations.

That is DEELX's idea.

Demo in C++

C++
#include "deelx.h"

int main(int argc, char * argv[])
{
    // text
    char * text = "12.5, a1.1, 0.123, 178";

    // declare
    static CRegexpT <char> regexp("\\b\\d+\\.\\d+", IGNORECASE | MULTILINE);

    // loop
    MatchResult result = regexp.Match(text);

    while( result.IsMatched() )
    {
        printf("%.*s\n", result.GetEnd() - result.GetStart(), text + result.GetStart());

        // get next
        result = regexp.Match(text, result.GetEnd());
    }

    return 0;
}

Regex flag definition:

C++
enum REGEX_FLAGS
{
 NO_FLAG        = 0,
 SINGLELINE     = 0x01,
 MULTILINE      = 0x02,
 GLOBAL         = 0x04,
 IGNORECASE     = 0x08,
 RIGHTTOLEFT    = 0x10,
 EXTENDED       = 0x20,
};

Wrap for Delphi (Statically Linked into Delphi Project)

Use Borland C++ Builder to compile DEELX into a .obj file, then link this .obj file into a Delphi Unit: DEELX.dcu.

C++
uses
  DEELX;

var
  result:TMatchResult;
  re:TRegexpA;

begin
  result := TMatchResult.Create();
  re := TRegexpA.Create(Edit1.Text, IGNORECASE + MULTILINE); // the 2nd is 'FLAG's

  re.Match(Edit2.Text, result);

  if result.IsMatched() then
  begin
    Edit2.SelStart := result.GetStart();
    Edit2.SelLength := result.GetEnd() - result.GetStart();
  end
  else
  begin
    Edit2.SelLength := 0;
  end;

  re.Destroy;
  result.Destroy;
end;

Regex flags definition:

C++
const
  NO_FLAG        = $00;
  SINGLELINE     = $01;
  MULTILINE      = $02;
  GLOBAL         = $04;
  IGNORECASE     = $08;
  RIGHTTOLEFT    = $10;
  EXTENDED       = $20;

Wrap to ActiveX for VB

Wrap DEELX to an ActiveX plugin, so DEELX can be used in VB or ASP file.

C++
Private pos As Integer
Private re As New RegExLab.RegExp

Private Sub Command1_Click()
    re.Compile (Text1.Text, "igm") ' the 2nd parameter is 'FLAG's

    re.Match Text2.Text, pos

    If re.IsMatched Then
        pos = re.End
        Text2.SelStart = re.Begin
        Text2.SelLength = re.End - re.Begin
    Else
        pos = -1
        Text2.SelLength = 0
    End If
End Sub

The flags are the same as JScript.Regexp:

s  -  SINGLELINE
m  -  MULTILINE
g  -  GLOBAL
i  -  IGNORECASE
r  -  RIGHTTOLEFT
x  -  EXTENDED

DLL Version of DEELX

The DLL version of deelx uses stdcall format for every function, because Visual Basic can call stdcall only.

The demo.zip contains two projects: one is in Visual Basic, the other is in Delphi.

References and Acknowledgements

Homepage - I'm the author, this is the homepage of DEELX.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
China China
Begin coding from basic, since 1994. Interested in coding and database and website constructing.
My website: http://www.regexlab.com/ - Regular Expression Laboratory
The easiest regex engine: http://www.regexlab.com/deelx/

Comments and Discussions

 
QuestionHow do I know if regex was compiled without errors? Pin
__PPS__18-Sep-13 11:53
__PPS__18-Sep-13 11:53 
AnswerRe: How do I know if regex was compiled without errors? Pin
__PPS__18-Sep-13 12:23
__PPS__18-Sep-13 12:23 
Generalfeature request Pin
masotta3-May-11 23:09
masotta3-May-11 23:09 
reg exp are very good for validating user input but it implies always a double process
1 verifying the "grammar" (regex)
2 verifying the content of many parts of the grammar (regular c/c++ tedious codding)

it would be nice being able to define "verify hooks" where we define the hook as a function pointer.
The function receives the preceding grammar and additional variables, and returns 1 or 0 if it validates

let say we tell the engine that "{\I,x,x}" is asociated with some function that takes one char* and 2 additional user supplied parameters.

then whenever the regex parsing engine finds the defined hook grammar the corresponding function is called with a char* pointing to the corresponding parsed part and the extra parameters added by the user, validating only if the function returns 1

i.e
myValidationFunc(char* a ,int b, int c)
{
int x;
sscanf(a,"/d",&x);
if (x > a && x< b)
return 1;
else
return 0;

}

.associateHook("{\I,x,x}",&myValidationFunc);

then if we write a regex like
([-+]?\d+){\I,5,8}

whenever it parse the first () recognizing an integer will pass the text of that integer to the validation function asociated with the \I plus the aditional parameters 5 and 8, if the function returns 1 we continue if not we reject...

this way we can define the validation function as we want
in order to avoid conflicting grammars the regex system can predefine the valid switches as a closed set; let say \A \B \C..\M or whatever

this is an mod of "Cheap character classes" concept implemented here
http://www.codeproject.com/KB/string/spencerregexp.aspx

what do you think?
can be done?
GeneralMy vote of 5 Pin
masotta2-May-11 22:01
masotta2-May-11 22:01 
Generalcapturing all pattern occurrence [modified] Pin
apple777-Jan-11 1:57
apple777-Jan-11 1:57 
GeneralRe: capturing all pattern occurrence Pin
apple7711-Jan-11 18:17
apple7711-Jan-11 18:17 
Generalhelp in regex Pin
Member 360833026-Jul-10 23:45
Member 360833026-Jul-10 23:45 
GeneralRe: help in regex Pin
sswater shi27-Jul-10 13:02
sswater shi27-Jul-10 13:02 
QuestionIs there a later version? Pin
Nicholas Kingsley22-Jun-10 6:09
Nicholas Kingsley22-Jun-10 6:09 
AnswerRe: Is there a later version? Pin
sswater shi27-Jul-10 12:58
sswater shi27-Jul-10 12:58 
GeneralStrange Pin
_mushr00m_1-Dec-09 22:31
_mushr00m_1-Dec-09 22:31 
Generalavoid {0} in source code [modified] Pin
jjshean15-Nov-09 22:50
jjshean15-Nov-09 22:50 
GeneralMatching some patterns takes too long Pin
jjshean24-Oct-09 3:58
jjshean24-Oct-09 3:58 
GeneralRe: Matching some patterns takes too long Pin
sswater shi24-Oct-09 4:31
sswater shi24-Oct-09 4:31 
GeneralRe: Matching some patterns takes too long Pin
jjshean25-Oct-09 11:00
jjshean25-Oct-09 11:00 
QuestionHow can I free memory after processing regular match/replace? Pin
ISL08s7-Oct-09 2:09
ISL08s7-Oct-09 2:09 
AnswerRe: How can I free memory after processing regular match/replace? [modified] Pin
James 03x7-Oct-09 10:44
James 03x7-Oct-09 10:44 
GeneralRe: How can I free memory after processing regular match/replace? Pin
sswater shi7-Oct-09 17:28
sswater shi7-Oct-09 17:28 
AnswerRe: How can I free memory after processing regular match/replace? Pin
sswater shi7-Oct-09 17:26
sswater shi7-Oct-09 17:26 
GeneralRe: How can I free memory after processing regular match/replace? Pin
James 03x7-Oct-09 23:33
James 03x7-Oct-09 23:33 
NewsBug found Pin
ISL08s3-Oct-09 1:28
ISL08s3-Oct-09 1:28 
GeneralRe: Bug found Pin
ISL08s3-Oct-09 1:46
ISL08s3-Oct-09 1:46 
GeneralRe: Bug found Pin
sswater shi3-Oct-09 5:16
sswater shi3-Oct-09 5:16 
GeneralRe: Bug found Pin
ISL08s4-Oct-09 10:53
ISL08s4-Oct-09 10:53 
GeneralRe: Bug found Pin
ISL08s4-Oct-09 11:03
ISL08s4-Oct-09 11:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.