Click here to Skip to main content
Click here to Skip to main content

I don't like Regex...

, 17 Jan 2013 CPOL
Rate this:
Please Sign up or sign in to vote.
This article will introduce you with a set of 3 simple extension methods that can help you getting rid of Regex in many situations
Download The Source Files

Introduction

In fact I do like Regex : they do the job well. Even too well as all developers have to use them and there is no way to get rid of it.

Unfortunately whenever I need a new one I am facing the same issue : I have forgotten almost everything about their damned syntax... If I were to write one everyday I would probably easily remember it but that's not the case as I barely need to write a couple of them in a year...

Being fed up reading and learning that documentation again and again I decided to implement the following String extensions method...

Background

Regular expressions are a powerful and concise mean for processing large amount of text in order to validate, extract, edit, replace or delete part of a text given a predefined pattern (ex: an email address)

In order to make proper use of Regex you need:

  • a text to analyse
  • a regular expression engine
  • a regular expression (the pattern to look for in the text to analyse)

the regular expression syntax varies depending on the regular expression engine you use. In the Microsoft world the class that serves as the regular expression engine is System.Text.RegularExpressions.Regex and its syntax is described here : http://msdn.microsoft.com/en-us/library/az24scfc.aspx

If you are looking for an introduction to regular expression syntax please read this excellent article : http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial

The problem with regular expressions

They have the drawback of their advantages : the syntax (concise and powerful) is intended to be friendly for regular expression engines but not really to human beings.

When not familiar with the syntax you can spend a long time writing a valid expression.

You can spend another long time testing that expression and make it bullet proof. It is one thing to make sure your regular expression is matching what you expect but it is another thing to make sure it is matching ONLY what you expect.

The idea

If you are familiar with SQL you know the LIKE operator. Why not bringing that operator to C#?

Why not having a simplified syntax for the most frequent operations you would ask your Regex engine to perform?

A simplified syntax

... means less operators. Here is the list that I have, very arbitrary, come up with :

  • ? = Any char 
  • % = Zero or more character
  • * = Zero or more character but no white space (basically a word)
  • # = Any single digit (0-9)

examples of simple expressions:  

  • a Guid can be expressed as : ????????-????-????-????-????????????
  • an email address could be : *?@?*.?*
  • as for a date : ##/##/####

Regular expression aficionados are already jumping on their chairs: obviously nothing guarantees the latest expression match a valid date and they are right (that expression would match 99/99/9999). But in no way that syntax replace the regular expressions one. It is far from offering the same level of capabilities especially in terms of validation.  

Frequent operations

What are the frequent operations you need a regular expression engine for?

  1. determining if the text to analyse matches a given pattern : Like 
  2. finding an occurrence of a given pattern  in the text to analyse : Search 
  3. retrieving string(s) in the text to analyse :Extract 

these 3 operations  'Like', 'Search' and 'Extract' have been implemented as extension methods of strings as an alternative to a Regular expression engine. 

Let's start describing their usage first and code will follow... 

1. Determining if a string is 'like' a given pattern  

You know SQL then you know what I am talking about...  

the Like extension simply returns true when the input string match the given pattern. 

All following examples are returning true, meaning input strings are like their patterns. 

example: a string is a guid

var result0 = "TA0E02391-A0DF-4772-B39A-C11F7D63C495".Like("????????-????-????-????-????????????");

example: a string ends with a guid

var result1 = "This is a guid TA0E02391-A0DF-4772-B39A-C11F7D63C495".Like("
%????????-????-????-????-????????????");

example: a string starts with a guid 

var result2 = "TA0E02391-A0DF-4772-B39A-C11F7D63C495 is a guid".Like("????????-????-????-????????????%");

example: a string contains a guid  

var result3 = "this string TA0E02391-A0DF-4772-B39A-C11F7D63C495 contains a guid".Like("%????????-????-????-????-????????????%");

example: a string ends with a guid  

var result4 = "TA0E02391-A0DF-4772-B39A-C11F7D63C495".Like("%????????-????-????-????-????????????");

2. 'Searching' for a particular pattern in a string  

The Search extension methods retrieve the first occurrence of the given pattern inside the provided text. 

example: Search for a guid inside a text

var result5 = "this string [TA0E02391-A0DF-4772-B39A-C11F7D63C495] contains a string matching".Search("[????????-????-????-????-????????????]");
Console.WriteLine(result5); // output: [TA0E02391-A0DF-4772-B39A-C11F7D63C495]

3. 'Extracting' values out of a string  given a known pattern 

Almost like searching but does not bring back the whole string that matches the pattern but an array of the strings matching the pattern groups.

example: retrieving the consituents of a guid inside a text

var result6 = "this string [TA0E02391-A0DF-4772-B39A-C11F7D63C495] contains a string matching".Extract("[????????-????-????-????-????????????]");
// result is an array containing each part of the pattern: {"TA0E02391", "A0DF", "4772", "B39A", "C11F7D63C495"}

example: retrieving the consituents of an email inside a text

var result7 = "this string contains an email: toto@domain.com".Extract("*?@?*.?*");
// result is an array containing each part of the pattern: {"toto", "domain", "com"}

Here's the code

The simple trick here is that the 3 different public methods relies on GetRegex which transforms the simplified expression into a valid .net one 

public static class StringExt
{
    public static bool Like(this string item, string searchPattern)
    {
        var regex = GetRegex("^" + searchPattern);
        return regex.IsMatch(item);
    }

    public static string Search(this string item, string searchPattern)
    {
        var match = GetRegex(searchPattern).Match(item);
        if (match.Success)
        {
            return item.Substring(match.Index, match.Length);
        }
        return null;
    }

    public static List<string> Extract(this string item, string searchPattern)
    {
        var result = item.Search(searchPattern);
        if (!string.IsNullOrWhiteSpace(result))
        {
            var splitted = searchPattern.Split(new[] { '?', '%', '*', '#' }, StringSplitOptions.RemoveEmptyEntries);
            var temp = result;
            var final = new List<string>();
            foreach(var x in splitted)
            {
                var pos = temp.IndexOf(x);
                if (pos > 0)
                {
                    final.Add(temp.Substring(0, pos));
                    temp = temp.Substring(pos);
                }
                temp = temp.Substring(x.Length);
            }
            if (temp.Length > 0) final.Add(temp);
            return final;
        }
        return null;
    }

    // private method which accepts the simplified pattern and transform it into a valid .net regex pattern:
    // it escapes standard regex syntax reserved characters 
    // and transforms the simplified syntax into the native Regex one
    static Regex GetRegex(string searchPattern)
    {
        return new Regex(searchPattern
                .Replace("\\", "\\\\")
                .Replace(".", "\\.")
                .Replace("{", "\\{")
                .Replace("}", "\\}")
                .Replace("[", "\\[")
                .Replace("]", "\\]")
                .Replace("+", "\\+")
                .Replace("$", "\\$")
                .Replace(" ", "\\s")
                .Replace("#", "[0-9]")
                .Replace("?", ".")
                .Replace("*", "\\w*")
                .Replace("%", ".*")
                , RegexOptions.IgnoreCase);
    }
}

Conclusion

As stated above the intent is not to replace Regex but to provide a very simple approach for solving about 80% of the cases I previously had the need for Regex. This approach keeps basic tasks very simple and makes the client code very easy to write and obvious to understand to anyone who is not expert with Regex syntax.  

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Guirec
Architect
New Zealand New Zealand
No Biography provided

Comments and Discussions

 
AnswerRe: My 4 PinmemberGuirec Le Bars21-Jan-13 15:25 
GeneralMy vote of 4 Pinmemberbaxiqiuxing18-Jan-13 15:47 
GeneralRe: My vote of 4 PinmemberGuirec Le Bars21-Jan-13 15:25 
GeneralMy vote of 5 PinmemberForogar18-Jan-13 4:36 
AnswerRe: My vote of 5 PinmemberGuirec Le Bars18-Jan-13 4:56 
GeneralMy vote of 5 PinmemberJerome Vibert18-Jan-13 4:00 
GeneralRe: My vote of 5 PinmemberGuirec Le Bars18-Jan-13 4:53 
QuestionMy Vote of 2 Pinmemberrobocodeboy18-Jan-13 1:28 
The article is well written, but I find that the whole thing you did is to give a really small subset of regular expressions some non-standard aliases.
 
I mean, why in the world a '?' should be more intuitive or easy to remember than a '.'?
 
Nice try, but I think it's a completely wrong approach.
 
I learned regexes and I can use them in all the languages I write software with.
 
Anyone would gain a lot more in learning that the world is using '.' to define any char and * to define "match what's before the star, repeated zero or more times".
 
Maybe a fluent syntax exposed to Intellisense could be better suited to what you wanted to accomplish.
AnswerRe: My Vote of 2 PinmemberGuirec Le Bars18-Jan-13 4:52 
GeneralMy vote of 5 PinmemberMark Lemke18-Jan-13 0:18 
AnswerRe: My vote of 5 PinmemberGuirec Le Bars18-Jan-13 4:48 
AnswerI can see your point PinmemberClifford Nelson17-Jan-13 15:16 
AnswerRe: I can see your point PinmemberGuirec Le Bars17-Jan-13 15:26 
GeneralRe: I can see your point PinmemberClifford Nelson18-Jan-13 14:40 
QuestionRe: I can see your point PinmemberGuirec Le Bars20-Jan-13 15:42 
Questiondownloadable Pinmemberfilmee2416-Jan-13 8:17 
AnswerRe: downloadable PinmemberGuirec Le Bars16-Jan-13 15:13 
GeneralRe: downloadable Pinmemberfilmee2417-Jan-13 6:24 
AnswerRe: downloadable PinmemberGuirec Le Bars17-Jan-13 15:05 
GeneralMy vote of 5 Pinmembertorial8-Nov-12 17:32 
AnswerRe: My vote of 5 [modified] PinmemberGuirec Le Bars8-Nov-12 17:53 
GeneralMy vote of 4 Pinmemberalmerak17-Jul-12 6:55 
GeneralRe: My vote of 4 PinmemberGuirec Le Bars8-Nov-12 17:55 
GeneralMy vote of 3 PinmemberJames Hurburgh8-May-12 19:00 
GeneralRe: My vote of 3 PinmemberMario Majčica29-Oct-12 6:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.141223.1 | Last Updated 17 Jan 2013
Article Copyright 2012 by Guirec
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid