Click here to Skip to main content
13,251,998 members (65,887 online)
Click here to Skip to main content
Add your own
alternative version


13 bookmarked
Posted 26 Sep 2008

What is the best way to create Regular Expressions?

, 26 Sep 2008
Rate this:
Please Sign up or sign in to vote.
A convenient way to document the intent of each part of a regex.

What is the best way to create Regular Expressions?

Regular Expressions are notorious for being confusing to read and understand. The longer the Regular Expression, the higher the chance of making a mistake in it, and the more difficult it is to debug or modify. Of course, every Regular Expression would be commented thoroughly. It would still suffer from being a single long line of characters.

Consider this Regular Expression that is found at

(?s)( class=\w+(?=([^<]*>)))|(<!--\[if.*?<!\[endif\]-->)|
  (<!\[if !\w+\]>)|(<!\[endif\]>)|(<o:p>[^<]*</o:p>)|

There's nothing wrong with the expression itself. Unfortunately, no matter how thorough we document it, we cannot easily, visually, associate a comment with the part of the Regular Expression string that is being described.

The real problem is that a single long Regular Expression line does not allow a developer to show the intent of each significant part of it. Each part of a Regular Expression must scream its purpose. If a Regular Expression is several lines long, and it does not work properly, the developer will have a hard time locating the point that is responsible for the failure.

The solution is really simple. I have not seen a similar technique used anywhere, so this feels like a good example to share. Instead of entering a Regular Expression as a single long cryptic string, the string is built dynamically as a sum of very short cryptic strings. Each short piece of Regular Expression is commented separately.

For example, the following class creates a regex to validate a Canadian postal code:

public class CanadianPostalCodeRegex
    /// <summary>
    /// Canadian postal code regular expression pattern.
    /// </summary>
    private string _strPattern;
    /// <summary>
    /// Singleton access.
    /// </summary>
    private static CanadianPostalCodeRegex Instance = new CanadianPostalCodeRegex();

    private CanadianPostalCodeRegex()
        StringBuilder patternBuilder = new StringBuilder();

        // Pattern description:
        // Start of string.
        // Start the FSA group
        // FSA group consists of ANA, where A is a letter and N is a digit
        // End the FSA group
        // An optional single white space
        // Start the LDU group
        // LDU group consists of NAN, where A is a letter and N is a digit
        // End the LDU group
        // End of string.

        _strPattern = patternBuilder.ToString();

    /// <summary>
    /// Gets the Canadian postal code regex pattern.
    /// </summary>
    public static string Pattern
        get { return Instance._strPattern; }

A Regular Expression is created piece by piece. Each smallest meaningful unit is thoroughly commented. The intention of each part is crystal clear, which is a huge help when one needs to fix or modify the regex. At all times, we need to deal with a fairly small regex string, instead of an unwieldy cryptic monster.

This technique also promotes the syntactic correctness of the Regular Expression. For example, a group construct can be entered first, making sure parenthesis match.

// Start the LDU group
// End the LDU group

Next, the group's pattern is entered.

// Start the LDU group
// LDU group consists of NAN, where A is a letter and N is a digit
// End the LDU group

Being a Singleton, the expression will be built only once. There is virtually no performance penalty. Readability and maintainability improves significantly.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Alex Perepletov
Software Developer (Senior)
Canada Canada
No Biography provided

You may also be interested in...


Comments and Discussions

GeneralCould be clearer Pin
DoctorMemory30-Sep-08 16:54
memberDoctorMemory30-Sep-08 16:54 
GeneralRe: Could be clearer Pin
Alex Perepletov30-Sep-08 20:45
memberAlex Perepletov30-Sep-08 20:45 
GeneralI like your solution. Pin
Ashaman30-Sep-08 6:44
memberAshaman30-Sep-08 6:44 
GeneralRe: I like your solution. Pin
Alex Perepletov30-Sep-08 11:49
memberAlex Perepletov30-Sep-08 11:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.171114.1 | Last Updated 26 Sep 2008
Article Copyright 2008 by Alex Perepletov
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid