A while ago, I put a little Regular Expression testing and transformation utility on my website (http://www.carljohansen.co.uk/utils/regexpdotnet.aspx). To my surprise, it has proven so handy that now I use it all the time - especially for bulk transformations. I am documenting it here in the hope that it might come in handy for you too.
Okay, there's nothing here that you can't get from many other applications. But the great thing about my utility is that you don't need any special software - just a browser. (You will need to know how to use Regular Expressions; the best regex reference I know is www.regular-expressions.info.)
So, what does it do? Basically, it gives you easy access to some of the functionality of the .NET Framework's
Regex class. There are three major uses:
- Testing a Regular Expression to make sure that it does what you want.
- Getting a C# code snippet for performing the match.
- Performing a transformation (search and replacement or extraction) on some text.
Let's look at these in detail. To follow along, you might like to open the utility page in another tab.
Testing a Regular Expression
Regular Expressions can be hard to read, and the semantics are hard to remember, so when I'm creating an expression, I like to test it against a few sample inputs before putting it into my code. Let's say we want to test the following expression for matching UK postcodes:
We put the expression in the Regular Expression box, and put a few test cases in the Text to search box:
Then, we check Find all matches and click Find (shortcut key: Alt+F). This gives us the following output in the Result box:
Match #1 (char 1): G8 3LB
Match #2 (char 8): EC12 8QW
As we expected, the expression matches the first two lines, but not the last two (note that character indexes in the results are 1-based, and a line break counts as a character).
Getting the C# code snippets
I don't know about you, but I can never remember the details of the .NET
Regex class. With the Regular Expression tester, I don't have to. Once I have my expression working against the source text (possibly with a few options selected, such as Match case), I just click Show C# Code (shortcut key: Alt+W), and I get a snippet of code for performing the match. Note that the Find all matches option determines whether the app generates a call to
Regex.Match (single match) or
Regex.Matches (all matches).
My original motive for developing the regex tester was to test expressions like the postcode example above. But if that were the only feature, then I probably wouldn't use it too often. What makes this utility indispensable is its ability to do search-and-replace. This feature comes in two flavours: Replace and Extract. In both cases, the app finds all the matches of your Regular Expression and replaces each one with your replacement expression. The difference is in the handling of non-matching text: Replace leaves it intact, while Extract drops it.
The Replace feature
We'll look at Replace first. To illustrate it, let's contrive an example. In our C# code, we have a bunch of method calls (interspersed with comments):
Now, let's say, we have redesigned the
Customer class and we want to turn these calls into assignments of new properties called
customer1.Name = "F Smith";
customer1.Age = 31;
customer2.Name = "J Jones";
customer2.Age = 42;
We could convert each line manually (raise your hand if you've ever done that...), but if there are more than about ten lines, then a regex transformation is going to save us a lot of hassle. First, we put the original code into the Text to search box, then we put this expression into the Regular expression box:
Check the Multiline mode box to perform line-by-line matching of ^ and $ (the start and end anchors).
Before we do the replacement, let's do a quick check to see if we have the right expression. Leave the Replacement text box empty, check the Find all matches box, and click Find. We get the following output in the Results box:
Match #1 (char 1): customer1.SetName("F Smith");
* Group #1: customer1
* Group #2: Name
* Group #3: "F Smith"
Match #2 (char 31): customer1.SetAge(31);
That's just what we need. We can use the substitution patterns $1, $2, etc., in the replacement expression to refer to the subgroups. Put the following expression into the Replacement text box:
$1.$2 = $3;
(Note that the replacement text must follow the rules for replacement patterns). Now, when we click Replace (shortcut key: Alt+R), we get the required output in the Results box.
Note that the Find all matches box has no effect when we are performing a replacement. By default, the replacement operation replaces all occurrences of the regex in the source text with the replacement text. You can limit the number of replacements by entering a number in the Max replaces box.
The Extract feature
In that example, we used the Replace feature because we wanted to include the non-matching lines (the code comments) in the output. But, what if we want to throw away anything that doesn't match? We could maybe do it with a complicated regex and replacement pattern, but it's much easier to use the Extract feature. Let's extract the last word from each line in this snippet from The Gondoliers:
That King, although no one denies
His heart was of abnormal size,
Yet he'd have acted otherwise
If he had been acuter.
The end is easily foretold,
When every blessed thing you hold
Is made of silver, or of gold,
You long for simple pewter.
When you have nothing else to wear
But cloth of gold and satins rare,
For cloth of gold you cease to care--
Up goes the price of shoddy.
In short, whoever you may be,
To this conclusion you'll agree,
When every one is somebodee,
Then no one's anybody!
We can find the last word on each line with this:
Check the Multiline mode box to perform line-by-line matching of $.
We can leave the Replacement Text box blank, but in that case, the Extract feature will simply write out all the matches contiguously. More often, we will wrap each match in some surrounding text. In this exercise, we will simply comma-separate them, so put this replacement pattern in the Replacement Text box:
Now, clicking Extract (shortcut key: Alt+X) gives us this output:
In this case, we used $1 to get the subgroup that holds the non-punctuation characters, but remember that you can also use $0 to get the text of the whole match. (By the way, for the Extract feature, leaving the replacement pattern blank is equivalent to setting it to $0.)
Line breaks in the replacement pattern
You might have noticed that the Replacement Text box is in multi-line mode. This allows you to put line breaks into the replacement pattern (by pressing Enter!). For example, in the last exercise, we could have used a line break instead of a comma to separate the words. Of course, this is different from putting \n in the replacement pattern. If your replacement pattern contains \n, then the output will contain just that: a backslash followed by an n (which might be what you want in some cases). To get an actual line break in the output, you must put an actual line break in the replacement pattern.
The regex language is hard to remember if you're only an occasional user (remind me again what's the difference between \w and \W ?). The Regex Tester utility includes a link to the .NET Framework regex reference pages, which should answer all your questions about Regular Expressions.
The possibilities for these types of operations are endless. The feeling you get when you hit Replace and see 50 new lines of code ready to run takes some beating.
One thing's for sure: I'm no regex expert. If you find this utility useful or can see how it could be improved, then please let me know.