|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Services
Chapters
Feature Zones
|
Note: This is an unedited contribution. If this article is inappropriate,
needs attention or copies someone else's work without reference then please
Report This Article
IntroductionWith RegEx Tester you can fully develop and test your regular expression against a target text. Feature listIf you have an idea for a new feature you could code it up and send it to me by email and I will add it here with the proper credits for your work. If you don't know how to properly code that idea comment about it ! and other people (or me) can do it. Let's collaborate.
What is special about this program?As balazs_hideghety said on his comment, there are other popular programs (e.g. Regex buddy and Expresso) that are extremely powerful and seems to be the last word on RegEx testing and developing but I still use this tool. Why? I needed a tool that helped me design HTML Extraction RegExs. For example (as of March 2008) if you evaluate this: <td class="Frm_MsgSubject"><[^>]*?>(?<title>.*?)</a>.*?<td class="Frm_MsgAuthor"><[^>]*?>(?<author>.*?)</a>.*?
<td class="Frm_MsgDate"[^>]*?>(?<date>.*?)&.*?<td class="MsgBd BdSel ">.*?<td colspan="2">(?<body>[^<]*?)<"
against the html source of this page you would be parsing and extracting in only one operation all the comments inside this page with it's corresponding title, body, date, user, etc... RegExs are really powerful when you are extracting data of real world websites. But the problem is that the RegEx needed are looooooong and extremely cryptic. You really start to need some spacing and indentation there to unofbuscate it and you need a big window with a lot of space and a test textbox able to handle big raw html documents so there's when this tools turns out to be really useful.The same RegEx while I was developing it in RegExTester looked like this: <td\sclass="Frm_MsgSubject"> <[^>]*? > (?<title>.*?) </a>
.*?
<td\sclass="Frm_MsgAuthor"> <[^>]*? > (?<author>.*?) </a>
.*?
<td\sclass="Frm_MsgDate" [^>]*? > (?<date>.*?) &
.*?
<td\sclass="MsgBd\sBdSel\s"> .*? <td\scolspan="2"> (?<body>[^<]*?) <
As you can see, I think that ugly RegEx development aidind is the key feature of this program
The core of the program: AsyncTest()This is a simplyfied version of the function to make it more readable. // Create the options object based on the UI checkboxes
RegexOptions regexOptions = new RegexOptions();
if (cbIgnoreCase.Checked) regexOptions |= RegexOptions.IgnoreCase;
if (cbMultiLine.Checked) regexOptions |= RegexOptions.Multiline;
if (cbSingleLine.Checked) regexOptions |= RegexOptions.Singleline;
if (cbCultureInvariant.Checked) regexOptions |= RegexOptions.CultureInvariant;
// Create the RegEx string with optional manipulations
string regexString = txtRegEx.Text;
if (cbIndentedInput.Checked) regexString = stripIndentation(regexString);
// Creates the RegEx engine passing the RegEx string and the options object
Regex regex = new Regex(regexString, regexOptions);
// This executes the Regex and collects the results
// The execution isn't done until a member of the matchCollection is read.
// So I read the Count property for the regex to really execute from start to finish
MatchCollection matchCollection = regex.Matches(rtbText.Text);
int matchesCount = matchCollection.Count;
// Add the Capture Group columns to the Results ListView
int[] groupNumbers = regex.GetGroupNumbers();
string[] groupNames = regex.GetGroupNames();
string groupName = null;
foreach (int groupNumber in groupNumbers)
{
if (groupNumber > 0)
{
groupName = "Group " + groupNumber;
if (groupNames[groupNumber] != groupNumber.ToString())
groupName += " (" + groupNames[groupNumber] + ")";
lvResult.Columns.Add(groupName, 100, HorizontalAlignment.Left);
}
}
// Process each of the Matches!
foreach (Match match in matchCollection)
{
//Add it to the grid
ListViewItem lvi = lvResult.Items.Add(match.ToString());
lvi.SubItems.Add(match.Index.ToString());
lvi.SubItems.Add(match.Length.ToString());
for (int c = 1; c < match.Groups.Count; c++)
{
lvi.SubItems.Add(match.Groups[c].Value);
}
//Highligth the match in the RichTextBox
rtbText.Select(match.Index, match.Length);
rtbText.SelectionColor = Color.Red;
}
The asynchronous execution feature. Where the fun starts !I first coded it using a BackgroundWorker but I had to throw it out because it seems that it's only useful when you want to abort a long loop that is IN your code... But it doesn't helps you when you call an external function that takes too long to complete So I re-coded it from scratch using a more low level Thread managing that resulted to be more simple and clear than the previous technique once it was done private Thread worker; // The worker that really does the execution in a separate thread.
private void MainForm_Load(object sender, System.EventArgs e)
{
// This is a critical line.
// It allows the other thread to access the controls of this class/object.
Control.CheckForIllegalCrossThreadCalls = false;
}
///
The Strip Indentation featureThis is plain simple. I always try to KISS ///
The Copy feature///
Program's HistoryThis tool was originally written by Davide Mauri (2003). I used it A LOT at work and for personal projects and thanks to the fact that it was open source I started to add new features that I needed and one day the program was so different that I wanted to give all this enhancements to Davide so I contacted him by email and he gave me permission to re-release it and gave me the link to the Kurt Griffiths version of the program (2006). I made a mix of his and my enhancements and polished the UI for the real world. Recommended links
Other links
Article History
Program History
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||