Introduction
The .NET framework provides a powerful class Regex
for creating and using Regular Expressions. While regular expressions are a powerful way to parse, edit, and replace text, the complex syntax makes them hard to understand and prone to errors, even for the experienced user. When developing code that uses regular expressions, I have found it very helpful to create and debug my expressions in a separate tool to avoid time consuming compile/debug cycles. Expresso enables me to do the following:
Build complex regular expressions by selecting components from a palette
Test expressions against real or sample input data
Display all matches in a tree structure showing captured groups, and all captures within a group
Build replacement strings and test the match and replace functionality
Highlight matched text in the input data
Automatically test for syntax errors
Generate Visual Basic or C# code that can be incorporated directly into programs
Read or save regular expressions and input data
Background
Regular expressions are a sophisticated generalization of the "wildcard" syntax that users of Unix, MSDOS, Perl, AWK, and other systems are already familiar with. For example, in MSDOS, one can say:
dir *.exe
to list all of the files with the exe extension. Here the asterisk is a wildcard that matches any character string and the period is a literal that matches only a period. For decades, much more complex systems have been used to great advantage whenever it is necessary to search text for complex patterns in order to extract or edit parts of that text. The .NET framework provides a class called Regex
that can be used to do search and replace operations using regular expressions. In .NET, for example, suppose we want to find all the words in a text string. The expression \w will match any alphanumeric character (and also the underscore). The asterisk character can be appended to \w to match an arbitrary number of repetitions of \w, thus \w* matches all words of arbitrary length that include only alphanumeric characters (and underscores).
Expresso provides a toolbox with which one can build regular expressions using a set of tab pages from which any of the syntactical elements can be selected. After building an expression, sample data can be read or entered manually, and the regular expression can be run against that data. The results of the search are then displayed, showing the hierarchy of named groups that Regex
supports. The tool also allows testing of replacement strings and generation of code to be inserted directly into a C# or Visual Basic .NET program.
The purpose of this article is not to give a tutorial on the use of regular expressions, but rather to provide a tool for both experienced and novice users of regular expressions. Much of the complex behavior of regular expressions can be learned by experimenting with Expresso.
The reader may also find it helpful to explore some of the code within Expresso to see examples of the use of regular expressions and the Regex
class.
Using Expresso to Build and Test Regular Expressions on Sample Input Data
To use Expresso, download and run the executable file, which requires the .NET Framework. If you want to explore the code, download and extract the source files, open the solution within Visual Studio .NET, then compile and run the program. Expresso will start with sample data preloaded into the "Input Data" box (refer to the figure above). Create your own regular expression or select one of the examples from the list box. Click "Find Matches" to test the regular expression. The matches are shown in a tree structure in the "Results" box. Multiple groups and captures can be displayed by expanding the plus sign as shown above.
To begin experimenting with your own regular expressions, click the "Show Builder" button. It will display a set of tab pages as shown here:
In this example, the expression \P{IsGreek}{4,7}? has been generated by setting options that specify: match strings of four to seven characters, but with as few repetitions as possible, of any characters other than Greek letters. By clicking the "Insert" button, this expression will be inserted into the regular expression that is being constructed in the text box that contains the regular expression that will be used by the "FindMatch" or "Replace" buttons. Using the other tab pages, all of the syntactical elements of .NET regular expressions can be tested.
The Regex
class supports a number of options such as ignoring the case of characters. These options can be specified using check boxes on the main form of the application.
Replacement strings may be built using the various expressions found on the "Substitutions" tab. To test a replacement pattern, enter a regular expression, a replacement string, some input data, and then click the "Replace" button. The output will be shown in the "Results" box.
Using Expresso to Generate Code
Once a regular expression has been thoroughly debugged, code can be generated by selecting the appropriate option in the "Code" menu. For example, to generate code for a regular expression to find dates, create the following regular expression (or select this example from the drop down list):
(?<Month>\d{1,2})/(?<Day>\d{1,2})/(?<Year>(?:\d{4}|\d{2}))
By selecting the "Make C# Code" from the "Code" menu, the following code is generated. It can be saved to a file or cut and pasted into a C# program to create a Regex
object that encapsulates this particular regular expression. Note that the regular expression itself and all the selected options are built into the constructor for the new object:
using System.Text.RegularExpressions;
Regex regex = new Regex(
@"(?<Month>\d{1,2})/(?<Day>\d{1,2})/(?<Yea"
+ @"r>(?:\d{4}|\d{2}))",
RegexOptions.IgnoreCase
| RegexOptions.Multiline
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
Points of Interest
After creating this tool, I discovered the Regex Workbench by Eric Gunnerson, which was designed for the same purpose. Expresso provides more help in building expressions and gives a more readable display of matches (in my humble opinion), but Eric's tool has a nice feature that shows Tooltips that decode the meaning of subexpressions within an expression. If you are serious about regular expressions, try both of these tools!
History
Original Version: 2/17/03
Version 1.0.1148: 2/22/03 - Added a few additional features including the ability to create an assembly file and registration of a new file type (*.xso) to support Expresso Project Files.
Version 1.0.1149: 2/23/03 - Added a toolbar.
My brother John doesn't like the name Expresso, since he says too many people are already confused about the proper spelling and pronunciation of Espresso. Somehow, I doubt this program will have any impact on the declining literacy of America, but who knows. John prefers my earlier name "Mr. Mxyzptlk", after the Superman character with the unpronounceable name.
For the latest information on Expresso (if any), see http://www.ultrapico.com/ .