Click here to Skip to main content
6,594,932 members and growing! (17,137 online)
Email Password   helpLost your password?
Platforms, Frameworks & Libraries » .NET Framework » General     Intermediate

Regular Expressions in .NET

By Ovais Ahmad Khan

Introduction to regular expressions and how to use it in .NET.
C#, Windows, .NET 1.1VS.NET2003, Dev
Posted:6 Dec 2005
Updated:17 Dec 2005
Views:57,028
Bookmarked:24 times
Unedited contribution
Announcements
Loading...
 
Search    
Advanced Search
Add to IE Search
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
13 votes for this article.
Popularity: 2.60 Rating: 2.33 out of 5
4 votes, 30.8%
1
3 votes, 23.1%
2

3
5 votes, 38.5%
4
1 vote, 7.7%
5

Introduction

Regular expressions have been widely popular in languages such as PERL and AWK and have been utilized for pattern matching, text manipulation and text searching. These languages are specifically is known for its advanced pattern matching features. Dot Net regular expressions are based on that of Perl and are compatible with Perl 5 regular expressions.

To begin with, they are not as complex as they look, especially if you start experimenting with them. I would recommend that you download a tool such as Expresso (http://www.ultrapico.com/), to become familiar with regular expressions.

Regular Expression Elements

Some of the commonly used regular expression elements are:


^

Matches start of input

$

Matches end of input

.

Matches any character except new line

|

OR

*

Match the preceding expression 0 or more number of times

+

Match the preceding expression 1 or more number of times

?

Match the preceding expression 0 or 1 number of times

()

Logical group / sub-expression (capture as auto number group)

(?(exp))

Named capture group

(?=exp)

Match any position preceding a suffix exp

(?<=exp)

Match any position following a prefix exp

(?!exp)

Match any position after which exp is not found

(?<!--exp)

Match any position before which exp is not found

[�]

List of characters to match

[^expression]

Not containing any of the specified character

{n} or {n. m}

Quantifier (Match exact number or range of instances)

(?(exp (yes|no))

If expression (exp) is true match yes part else no part

\

Escape character (to match any of the special characters)

\w

Match any word character

\W

Match any non-word character

\s

Match any white space character

\S

Match any non-white space character

\d

Match any numeric digit

\D

Match any numeric digit

\b

Match a backspace if in character matching mode ([]).

Otherwise match the position at beginning or end of a word

\t

Match tab

\r

Match carriage return

\n

Match line feed



The following are matching substitutions:


num

Substitute last substring matched by group number num

${name}

Substitute last substring matched by group name

$&

Substitute a copy of entire text itself

$`

Substitute all the text of the input string before match

$�

Substitute all the text of the input string after match

$+

Substitute last matched group

$_

Substitute input string

$$

Substitute literal $



Regular expressions could also be used to find repeating patterns by making use of backreferencing, using which you can name a pattern found and then use that reference elsewhere in expression. This naming of patterns is also useful in case we need to parse a string like free form date or time strings.

Some Example Regular Expressions

  • Match a word - \btest\b
  • Match all 6 letter words - \b\w{6}\b
  • Match all 6 digit numbers - \b\d{6}\b
  • Match any number \b\d+\b

Instead of giving loads of examples here, I suggest that you download Expresso and check its analyzer view for detailed analysis of the regular expression.

Regular Expressions in .Net

As already discussed, .Net regular expressions are based on that of Perl and are compatible with Perl 5 regular expressions. Dotnet contains a set of powerful classes that makes it even easier to use regular expressions. The classes are available in the System.Text.RegularExpressions namespace. The following is a list of classes in the namespace:

Class

Description

Capture

Represents the results from a single subexpression capture. Capture represents one substring for a single successful capture.

CaptureCollection

Represents a sequence of capture substrings. CaptureCollection returns the set of captures done by a single capturing group.

Group

Group represents the results from a single capturing group. A capturing group can capture zero, one, or more strings in a single match because of quantifiers, so Group supplies a collection of Capture objects.

GroupCollection

Represents a collection of captured groups. GroupCollection returns the set of captured groups in a single match.

Match

Represents the results from a single regular expression match.

MatchCollection

Represents the set of successful matches found by iteratively applying a regular expression pattern to the input string.

Regex

Represents an immutable regular expression.

RegexCompilationInfo

Provides information that the compiler uses to compile a regular expression to a stand-alone assembly.

How to validate an input string in .Net

  • Create a Regex object �RegexObj�
  • Call RegexObj.IsMatch (subjectString ), which will return a Boolean showing validity of input string

How to perform regular expression substitution (search and replace) in .Net

  • Create a Regex object �RegexObj�
  • Call RegexObj.Replace ( subjectString, replaceString ), which will return a Boolean showing validity of input string

How to parse an input string in .Net

  • Create a Regex object �RegexObj�, make sure to name the expressions
  • Call RegexObj.Match ( subjectString ), which will return a list of matches in the input string as per the match regular expression
  • Iterate through the matches to perform post parsing

Free form time parsing function in DotNet

The following is a utility function that can parse a free format time string. This could be extended to a combined date and time parser along with many more enhancements. If anyone needs further help, feel free to contact me.

private const string TIME_STR = @"^\s?(?" 
           + @"(?\d{1,2})" + @"(:(?\d{1,2}))?"
           + @"\s?((?(am|pm)))?"
           + @")\s?$";

static DateTime ParseTime (string strTime)
{
    DateTime currTime = DateTime.Now;
    DateTime finalTime = DateTime.Today;
    Match m;
    int hour = 0,
    min = 0;
    Regex regExTime = new Regex (TIME_STR, 
        RegexOptions.IgnoreCase
        | RegexOptions.CultureInvariant
        | RegexOptions.IgnorePatternWhitespace
        | RegexOptions.Compiled);

    m = regExTime.Match (strTime);

    if (m.Success)
    {

        if (m.Groups["hour"].Success)
            hour = Int32.Parse (m.Groups["hour"].Value);

        if (m.Groups["min"].Success)
            min = Int32.Parse (m.Groups["min"].Value);

        if (m.Groups["am_pm"].Success) 
            hour = ConvertAmPm (m.Groups["am_pm"].Value, hour);
    } 
    else
        throw new FormatException ("Invalid time format");

    if (hour > 23 || min > 59)
        throw new FormatException ("Invalid time format");

    finalTime = new DateTime (currTime.Year, currTime.Month,
    currTime.Day, hour, min, 0);
    return finalTime;
}

private static int ConvertAmPm (string amPm, int hour)
{
    int retHour = hour;
    amPm = amPm.ToLower();

    if (amPm.Equals("am")) 
    // all hours remain the same except the 12:00 am 

    // (which is 0000 hours)


        if (hour == 12)
            retHour = 00;
        else if (amPm.Equals("pm")) 
        // add 12 to hours except if 12:00 pm


    if (hour != 12)
        retHour = hour + 12;
    else
        throw new FormatException ("Invalid amPm flag format");
    return retHour;
}

Expresso Analysis of the regular expression used above is shown in the figure below. This should help you understand the details.

References and Further Reading

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Ovais Ahmad Khan


Member
Ovais Khan is a Software Engineer at Kalsoft (Pvt.) Ltd, where he is working on C# and C++ projects for clients in Pakistan and UAE.

He has done MS (Computer Science). For more details please visit Ovais Khan 's home page and his blog

Occupation: Web Developer
Location: Hong Kong Hong Kong

Other popular .NET Framework articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 3 of 3 (Total in Forum: 3) (Refresh)FirstPrevNext
NewsThere's a new version of the RegEx Tester Tool ! PinmemberBucanerO_Slacker0:31 2 Mar '08  
GeneralExtract From File Path PinmemberWDI23:16 3 Sep '07  
GeneralWingdings??? Pinmemberdclark11:37 6 Dec '05  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 17 Dec 2005
Editor:
Copyright 2005 by Ovais Ahmad Khan
Everything else Copyright © CodeProject, 1999-2009
Web20 | Advertise on the Code Project