5,117,952 members and growing! (13,536 online)
Email Password   helpLost your password?
General Programming » String handling » Regular Expressions     Intermediate License: The GNU General Public License (GPL)

RegEx Tester - Regular Expression Tester

By BucanerO_Slacker

It aids you to develop and fully test your regular expression against a target text.
C# (C# 1.0, C# 2.0, C# 3.0, C#), Windows (Windows, Win2K, WinXP, Win2003, Vista), .NET (.NET, .NET 2.0), Win32, Visual Studio (VS2005, VS), Dev

Posted: 1 Mar 2008
Updated: 8 Mar 2008
Views: 6,028
Announcements



Search    
Advanced Search
Sitemap
9 votes for this Article.
Popularity: 4.21 Rating: 4.41 out of 5
0 votes, 0.0%
1
0 votes, 0.0%
2
2 votes, 22.2%
3
0 votes, 0.0%
4
7 votes, 77.8%
5
Note: This is an unedited contribution. If this article is inappropriate, needs attention or copies someone else's work without reference then please Report This Article

RegExTester3.jpg

Introduction

With RegEx Tester you can fully develop and test your regular expression against a target text.
It's UI is designed to aid you in the RegEx developing; especially the big and complex one's.
It uses and supports almost ALL of the features available in the .NET RegEx Class.
About this article's writing I have to tell you that English is not my native language and that I did my best in the redaction, grammar and spelling. You may and will find writing errors so please tell me about them so I can correct them.

Feature list

If you have an idea for a new feature you could code it up and send it to me by email and I will add it here with the proper credits for your work. If you don't know how to properly code that idea comment about it ! and other people (or me) can do it. Let's collaborate.

  • Asynchronous execution. Enabling the user to abort the execution. Even if you make a Catastrophic Backtracking mess. [by Pablo Oses]
  • Indented Input mode. Which strips \r \n \t \v and spaces before execution. This allows you to write those ugly, long and cryptic RegExs in an indented and spaced fashion. [by Pablo Oses]
  • A button to "Clipboard Copy" the RegEx selecting "C# escaped string", "html encoded" or "Plain text" modes [by Kurt Griffiths and Pablo Oses]
  • F5 Hot Key to run the Test without changing the cursor position or selection. Get off of that Mouse ! [by Pablo Oses]
  • Listing of the matches showing position, length and anonymous or named capture groups. [by Davide Mauri and Pablo Oses]
  • Adjust the size of each of the 3 sections of the window (RegEx, Text and Results). [by Pablo Oses]
  • Ignore Case, MultiLine, SingleLine and Culture Invariant options [by Davide Mauri and Pablo Oses]
  • Window resizing and maximizing capability. [by Kurt Griffiths and Pablo Oses]
  • Links to a RegEx Library and ILoveJackDaniels' CheatSheet. [by Pablo Oses]
  • Test Text highlighting based on results selection. [by Davide Mauri]
  • Find function inside the Test Text. [by Pablo Oses]

What is special about this program?

As balazs_hideghety said on his comment, there are other popular programs (e.g. Regex buddy and Expresso) that are extremely powerful and seems to be the last word on RegEx testing and developing but I still use this tool. Why? I needed a tool that helped me design HTML Extraction RegExs. For example (as of March 2008) if you evaluate this:

<td class="Frm_MsgSubject"><[^>]*?>(?<title>.*?)</a>.*?<td class="Frm_MsgAuthor"><[^>]*?>(?<author>.*?)</a>.*?
<td class="Frm_MsgDate"[^>]*?>(?<date>.*?)&.*?<td class="MsgBd BdSel ">.*?<td colspan="2">(?<body>[^<]*?)<"
against the html source of this page you would be parsing and extracting in only one operation all the comments inside this page with it's corresponding title, body, date, user, etc... RegExs are really powerful when you are extracting data of real world websites. But the problem is that the RegEx needed are looooooong and extremely cryptic. You really start to need some spacing and indentation there to unofbuscate it and you need a big window with a lot of space and a test textbox able to handle big raw html documents so there's when this tools turns out to be really useful.
The same RegEx while I was developing it in RegExTester looked like this:
<td\sclass="Frm_MsgSubject">    <[^>]*?    >    (?<title>.*?)    </a>
.*?
<td\sclass="Frm_MsgAuthor">        <[^>]*?    >    (?<author>.*?)    </a>
.*?
<td\sclass="Frm_MsgDate"    [^>]*?    >    (?<date>.*?)    &
.*?
<td\sclass="MsgBd\sBdSel\s">    .*?    <td\scolspan="2">    (?<body>[^<]*?)    <
As you can see, I think that ugly RegEx development aidind is the key feature of this program

The core of the program: AsyncTest()

This is a simplyfied version of the function to make it more readable.

// Create the options object based on the UI checkboxes
RegexOptions regexOptions = new RegexOptions();
if (cbIgnoreCase.Checked) regexOptions |= RegexOptions.IgnoreCase;
if (cbMultiLine.Checked) regexOptions |= RegexOptions.Multiline;
if (cbSingleLine.Checked) regexOptions |= RegexOptions.Singleline;
if (cbCultureInvariant.Checked) regexOptions |= RegexOptions.CultureInvariant;

// Create the RegEx string with optional manipulations
string regexString = txtRegEx.Text;
if (cbIndentedInput.Checked) regexString = stripIndentation(regexString);

// Creates the RegEx engine passing the RegEx string and the options object
Regex regex = new Regex(regexString, regexOptions);

// This executes the Regex and collects the results
// The execution isn't done until a member of the matchCollection is read.
// So I read the Count property for the regex to really execute from start to finish
MatchCollection matchCollection = regex.Matches(rtbText.Text);
int matchesCount = matchCollection.Count;

// Add the Capture Group columns to the Results ListView
int[] groupNumbers = regex.GetGroupNumbers();
string[] groupNames = regex.GetGroupNames();
string groupName = null;

foreach (int groupNumber in groupNumbers)
{
    if (groupNumber > 0)
    {
        groupName = "Group " + groupNumber;
        if (groupNames[groupNumber] != groupNumber.ToString()) 
            groupName += " (" + groupNames[groupNumber] + ")";
        lvResult.Columns.Add(groupName, 100, HorizontalAlignment.Left);
    }
}

// Process each of the Matches!
foreach (Match match in matchCollection)
{
    //Add it to the grid
    ListViewItem lvi = lvResult.Items.Add(match.ToString());
    lvi.SubItems.Add(match.Index.ToString());
    lvi.SubItems.Add(match.Length.ToString());
    for (int c = 1; c < match.Groups.Count; c++)
    {
        lvi.SubItems.Add(match.Groups[c].Value);
    }

    //Highligth the match in the RichTextBox
    rtbText.Select(match.Index, match.Length);
    rtbText.SelectionColor = Color.Red;
}

The asynchronous execution feature. Where the fun starts !

I first coded it using a BackgroundWorker but I had to throw it out because it seems that it's only useful when you want to abort a long loop that is IN your code... But it doesn't helps you when you call an external function that takes too long to complete

So I re-coded it from scratch using a more low level Thread managing that resulted to be more simple and clear than the previous technique once it was done

private Thread worker; // The worker that really does the execution in a separate thread.

private void MainForm_Load(object sender, System.EventArgs e)
{
    // This is a critical line.
    // It allows the other thread to access the controls of this class/object.
    Control.CheckForIllegalCrossThreadCalls = false;
}

/// 
/// Handle the multiple behaviors of the Test button based on it's text
/// 
private void btnTest_Click(object sender, System.EventArgs e)
{
    if (btnTest.Text == STOPPED_MODE_BUTTON_TEXT)
    {
        StartTest();
    }
    else if (btnTest.Text == RUNNING_MODE_BUTTON_TEXT)
    {
        AbortTest();
    }
}

/// 
/// Prepare and launch the asynchronous execution using the backgroundWorker
/// 
private void StartTest()
{
    // Creates the separate Thread for executing the Test
    worker = new Thread(AsyncTest);

    // After this instruction if the worker hungs and this thread exits then nobody has to
    // wait for the worker to finish. (e.g. The worker will be aborted if the user wants to close the app.)
    worker.IsBackground = true;

    // Start the Asynchronous Test function
    worker.Start();
}

/// 
/// Instructs to abort the asynchronous execution of the Test.
/// 
private void AbortTest()
{
    // This generates a ThreadAbortException at the worker function AsyncTest()
    if (worker.IsAlive) worker.Abort();
}

/// 
/// This is the core of the app. The RegEx execution and processing function.
/// It's being run on a separated thread.
/// 
private void AsyncTest()
{
    // Every line in this function is susceptible of a ThreadAbortException
    // Which is how the user is able to stop it.
    try
    {
        // ***************************************
        // Here is the code that you already read 
        // in the previous section of this article
        // [The core of the program: AsyncTest()]
        // ***************************************
    }
    catch (ThreadAbortException)
    {
        sbpStatus.Text = "Test aborted by the user.";
    }
    catch (Exception e)
    {
        sbpStatus.Text = "Test aborted by an error.";
        // Any other Exception is shown to the user
        MessageBox.Show(e.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    }
    finally
    {
        // Restore the btnText functionality
        btnTest.Text = STOPPED_MODE_BUTTON_TEXT;
    }
}

The Strip Indentation feature

This is plain simple. I always try to KISS

/// 
/// This function removes the \r \n \t \v and ' ' from any string
/// It's used at StartTest, cbIndentedInput_CheckedChanged and copyGenericTSMenuItem_Click
/// 
private string stripIndentation(string text)
{
    return text
        .Replace("\r", "")
        .Replace("\n", "")
        .Replace("\t", "")
        .Replace("\v", "")
        .Replace(" ", "");
}

The Copy feature

/// 
/// This function handles all the Copy context menu options.
/// Formats the regex and copies it to the clipboard
/// 
private void copyGenericTSMenuItem_Click(object sender, EventArgs e)
{
    // Grab the original text
    string regex = txtRegEx.Text;

    // Optionally apply the stripIndentation funcion
    if (cbIndentedInput.Checked) regex = stripIndentation(regex);

    // I used the Tag attribute of each MenuItem to now identify which was pressed
    string format = ((ToolStripMenuItem)sender).Tag.ToString();
    if (format == "cs")
    {
        regex = "@\"" + regex.Replace("\"", "\"\"") + "\""; //change  my"quo\te  into  @"my""quo\te"
    }
    else if (format == "html")
    {
        regex = System.Web.HttpUtility.HtmlEncode(regex);
    }

    // Copy it to the clipboard. Clipboard.SetText fails if regex is ""
    if (!string.IsNullOrEmpty(regex)) Clipboard.SetText(regex);
}

Program's History

This tool was originally written by Davide Mauri (2003). I used it A LOT at work and for personal projects and thanks to the fact that it was open source I started to add new features that I needed and one day the program was so different that I wanted to give all this enhancements to Davide so I contacted him by email and he gave me permission to re-release it and gave me the link to the Kurt Griffiths version of the program (2006). I made a mix of his and my enhancements and polished the UI for the real world.

Recommended links

Other links

Article History

  • 2008-03-02 - Initial article
  • 2008-03-04 - Article completely rewritten to show and comment pieces of code used in the project.
  • 2008-03-05 - Article adapted to the new 3.0.0.0 version of the app with async execution.
  • 2008-03-08 - Moved histories to the bottom and updated links.

Program History

  • 2003-xx-xx - 1.0.0.3 - by Davide Mauri - Original version
  • 2006-xx-xx - 1.0.0.3 - by Kurt Griffiths - New features: Copy and window resize
  • 2008-03-02 - 2.0.0.0 - by Pablo Osés - New features: Group names, window maximize, Hot Keys, Indented Input, Culture Invariant, Resizeable Panels.
  • 2008-03-03 - 2.0.1.0 - by Pablo Osés - New features: RegEx Cheat-Sheet // Bug fixes: Multiline behavior, performance issues and results list click event.
  • 2008-03-05 - 3.0.0.0 - by Pablo Osés - New features: Asynchronous execution, Copy Feature enhanced, Test Textbox context menu and Find functions.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPL)

About the Author

BucanerO_Slacker


I'm from Argentina, 25 yo (in 2008) and work as a software developer since I was 16 yo after "playing" with computers since I was 8 (good old Z80... that were good years...)
I've worked with waaaaaay many languages and technologies... And lately I started to feel that I want to participate and contribute to open source projects and tools...
Occupation: Software Developer (Senior)
Company: Asignet (www.asignet.com)
Location: Argentina Argentina

Other popular String handling articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
 Msgs 1 to 5 of 5 (Total in Forum: 5) (Refresh)FirstPrevNext
Subject  Author Date 
GeneralGood job, thanksmembermgaert6:12 11 Mar '08  
GeneralSweet!membersirchas4:31 4 Mar '08  
GeneralOther SWmemberbalazs_hideghety21:17 3 Mar '08  
GeneralRe: Other SWmemberBucanerO_Slacker7:54 4 Mar '08  
GeneralVery Goodmemberdefwebserver8:16 3 Mar '08  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 8 Mar 2008
Editor:
Copyright 2008 by BucanerO_Slacker
Everything else Copyright © CodeProject, 1999-2008
Web10 | Advertise on the Code Project