Click here to Skip to main content
Click here to Skip to main content

Create Regex Objects using a Kind of "meta-variables" - Quicker and Easier

, 10 Jan 2007 CPOL
Rate this:
Please Sign up or sign in to vote.
This article describes a class VarRegex allowing you to reuse parts of regular expressions


Regexps (Perl-compatible regular expressions) are great, no doubt (refer to this wonderful article for a tutorial). But the little problem is that every regular expression's pattern should be presented in single string.

For example, suppose we want to specify a pattern for phone number with the following rules:

  • Digits are in groups of 1 or more
  • Spaces and minus sign are used as separators
  • At least one group of digits should be present

How would the appropriate pattern look? Something like this:

// the @ sign is used in C# to prevent parsing \ as escape sequence

This means that we have groups of digits (\d+) followed by a separator, either minus or space ([\s\-]), such groups can occur any (maybe 0) number of times, but at least one group of digits should be present (final \d+). Well, not very difficult, but not very nice at the same time.

Assume, at some moment, the customer says the number may include capital letters (like 1-800-GO-TO-THE-HELL-NOW). We have to change our digit group specification twice.
And if we have some regex for, example, real number in exponential format? Something like this...


... for only one (full) type of record, like 123.456E+120. But we can omit integer or fractional part. Our regex becomes really complex:


Brrr, really?

A Dream

For a long time, I had a dream (Smile | :) . A dream to write something like this:

    SIGNIFICANT_DIGIT = @"[1-9]";
    DIGIT = @"[0-9]";
    // ` quote is the rare special character 
    // not having its own meaning in regex syntax
    EXP_PART = @"[Ee](0|[+\-]?`INT_PART`)";
    NO_INT_EXP = @"[.,] `FLOAT_PART` `EXP_PART`";
    NO_FLOAT_EXP = @"`INT_PART` [.,] `EXP_PART`";

    // and finally

Well, much more lines of code, but:

  1. Each group of symbols is defined once and reused then, no doubling groups in different parts of pattern.
  2. Each line is much shorter and contains named literals, this makes an expression easier to understand.

This article describes a class created for similar syntax to be used in C# programs. It handles such expressions and returns a Regex object created with expanded pattern.


OK, the idea is as simple as possible. We create a class that allows adding "variables". Each variable can be a single regex expression or regex-like expression with references to previously added variables. Then the pattern is set in the same form. After that, we receive ready Regex object and use it as we like to.

We use ` quote to mark variables. If we want to use the quote itself (maybe someone still needs it Smile | :) , we can write "\`".

Implementation Details

The class VarRegex is created. It has nested enumerable class VariablesCollection built around a Dictionary<String, String>. This class allows adding and modifying variables using indexer property, retrieving their Count, Clear variables list and enumerating their values. The main VariablesCollection's method is called Expand. It receives a string to be "expanded", looks for variable names occurrences and replaces each variable's reference with its expanded value.

The method is implemented in the following way:

public String Expand(String pattern)
    if (pattern == "")
    return "";

    string p = pattern;
    p = p.Replace("\\`", ""+(char)1);

    r = new Regex("`([^`]+)`");
    MatchCollection ms = r.Matches(p);

    foreach (Match match in ms)
        string t = match.Groups[1].Value;
        p = p.Replace("`" + t + "`", Expand(variables[t]));

    p = p.Replace(""+(char)1, "`");

    return p;

First, we exclude "fake" quotes and slashes. Then we look for all quoted variables' names and replace each name with expanded variable's value. Finally we return all "fake" quotes (without slashes). Well, rather easy. Each time we make some changes to variables or patterns, a Regex object is recreated inside our VarRegex object. The class VariablesCollection also utilizes nested enumerable class ExpandedVariablesCollection, which allows enumerating or receiving by name expanded variables' values.


Now the code for generating regex for phone number from the introduction will look like this:

VarRegex vr = new VarRegex();
vr.Variables["int"] = @"\d+";
vr.Variables["sep"] = @"[\s\-]";
vr.Variables["gr"] = @"`int``sep`";
vr.Pattern = @"`gr`*`int`";
vr.Options = RegexOptions.IgnoreCase;

string str = @"123 568-99";
Match m = vr.Regex.Match(str);
Console.WriteLine("Result for string {1}: {0}\n", m.Success, str);


The main limitation is that variables should be added in the order that they are referenced. It means, the variable should be added to the VarRegex after all variables it references are already added.


  • 10th January, 2007: Initial post


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Eugene Mirotin (Guard)
Software Developer
Belarus Belarus
No Biography provided

Comments and Discussions

GeneralNice work PinmemberLight Walker18-Jan-07 4:22 
I have the same dream. And I've done the similar work many years before, but I found a problem that the regular expression cannot solve, if you defined a symbol, and the symbol contains another symbol recursively which make those symbols forms a ring, the function *expand* will die because of stack overflow.

And the only solution is read more *Compiler Principles* -_-|||, give up regular expression in very complex text format, or use ANTLR or some other language recognizers.

GeneralRe: Nice work PinmemberEugene Mirotin (Guard)18-Jan-07 4:37 
GeneralRe: Nice work PinmemberLight Walker18-Jan-07 4:50 
GeneralRe: Nice work PinmemberEugene Mirotin (Guard)18-Jan-07 4:57 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.150331.1 | Last Updated 10 Jan 2007
Article Copyright 2007 by Eugene Mirotin (Guard)
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid