The Grid Processor: Word Processing Abilities for the .NET DataGridView Component

Evan Stein

4.87/5 (14 votes)

May 7, 2007

11 min read

66413

1571

A plugin which offers search and replace, casing and other capabilities for the Microsoft .NET DataGridView component

Introduction

I have a lot of applications which use the DataGridView component. The DataGridView is a real work horse, though sometimes, the data it holds can be a bit worrying! In particular, I get a lot of text from FreeDB, where volunteers type in their CD titles. Each data enterer has their own unique "style". Some people are fussy and consistent, where others splatter the walls with happy typing.

After spending way too much time on corrections, I started to wish that the grid were more like a word processor, where you can do global replacements. This component is an answer to that need - something that plugs into the grid, and allows an end-user to whip data into shape without having to do the whole thing by hand.

The GridProcessor component is a window, which works like the Search and Replace window in a word processor. It allows you to keep it open while you work on the data underneath (and like the real thing, it sits on top of the item that it's found!)

Screenshot - GridProcessor.gif

It has four tabs for editorial and data cleaning functions, plus a fifth tab for macros.

Find	Finds a string, and moves to the cell that has that string.
Search and replace	Substitutes text. You can name, save and load specific expressions.
Upper/lower case	All the normal stuff, with the possibility of unusual things as well. Like the search and replace commands, you can store these items.
Split	Division of strings, assigning the pieces to the column(s) of your choice
Macros	Cleaning data always involves multiple operations. To take away drudgery, macros play sets of operations in a sequence.

For both philosophical and practical reasons, the testing box is a constant. It knows which tab is visible, and conducts the appropriate test on the string you provide.

Casing, Replacing and Regex

Most of the work revolves around replacement. Replacements can substitute one chunk of text for another (an eye for a tooth), or they can use regular expressions. Regular expressions let you to treat text abstractly, so they're extremely useful. They're even more useful for scaring people, and regular expressions can also be hard to work out again if you forget them. Therefore, the GridProcessor lets you save and recall expressions, and to give them names. This also makes it easier to give the component to non-technical end-users.

I was particularly pleased with the idea of using Regex for casing, though I'm sure I'm not the first person to think of that. Regular expressions don't have a provision for changing case, but we only need Regex to identify situations where we want to change case. After that, we can use regular string functions to do the work of uppering and lowering. The Regex Match object tells us both the start index and the length of the string we want to change, so it's easy to set up once we get there.

The benefit of the approach is that not only can we do upper, lower, and proper case, but we can also do some more curvy things. For instance, in Spanish only the first word of a title is capitalized, and everything else is lower case. We can make a rule called "Spanish title case". French people like to uppercase their surnames — «Like THIS» — so we could devise a "shout-your-SURNAME case" if we wanted to. Rather than catalogue all the things that people are likely to do, why not just give them the ability to do it when they think of it?

Grids and Documents

While there are similarities between grids and documents (hence the "GridProcessor" idea), the grid is a structured container, where a document is not. In a way, the grid is a bunch of tiny documents, each of which has the "subject" of its column header. What's significant for us is that we can choose which columns get updated, and we can leave the rest alone. In a word processing document, we either fire blindly or confirm each possible change. Here, we can fine-tune the changes, and thanks to the structure of the grid, even a global replacement can be more precise.

Because each cell is a small document, our expression logic may be slightly different. Also, changes are applied at the cell level, so if there are two instances of the word "the", for instance, both are changed, unless we specify an anchoring position. I had thought about allowing the user to confirm each change within a cell, but it wouldn't do them any favors when used from a macro. It would either plow through against their wishes, or leave unfinished business without telling anyone. It's best to get the expression right, so that it works the same way everywhere.

By the way, some of the changes could be surprising, so the component has a confirmation screen which shows the "before and after", with differences highlighted in blue. Currently, it only does the color-coding for literal replacements, leaving the user to work out what's changed when regular expressions are involved. Perhaps someone would like to submit an improved version with a good, color coded Diff() function.

Screenshot - ReplaceConfirm.gif

The Specifics

On initialization, the GridProcessor receives a grid reference as a parameter. Using the reference, it reads metadata from the grid, and compiles a list of columns that it can work with. At this point the component ignores everything except text (no numbers), and skips columns that are invisible. There's an array of "allowable" column classes, which can be added to if you've devised your own type of column.

The component keeps a live reference to the grid, and hence to all of the grid's data. Using the grid reference, the window can apply changes interactively, and can also control the location of the current cell.

At the base is a class called GridProcess, which holds instructions and working variables for transforming grid text. Search and replace and casing operations are both replacements, and they're represented by a Replacement class, which inherits GridProcess and adds a ReplaceExpression member. The Split also inherits from GridProcess, though its concerns are with columns. It retrieves the relatively-stable column headings from the database, turning them into more volatile column indices. Since it's possible for a split result to go into multiple columns, the result assignment works like a HashTable, with an array of column indices for each member of the string that's split. Below is a partial class diagram, made with Sacha Barber's wonderful AutoDiagrammer tool.

Screenshot - GridProcessorClasses.gif

There's a bit of overkill here, especially for a small component. The main advantage of the inheritance is that all of our objects can share the same data, which is strikingly similar. However, the functionality is quite different. One item evaluates an expression so we can jump to a cell. Another item transforms text. Another breaks up text and redistributes it. All these classes rely on external data, which is either in the grid, or in a text box (as part of a test). There are different needs, and each class has a very different factory function.

It would be nice for each object to have a Process() function which we could call in sequence, but that would involve a certain amount of violence. It's more natural to use the objects as data structures, and to program in a way that suits the grid we're working against. Each operation has a row function, and a simple, inner function which we can independently from the test harness box. While not exactly at the forefront of OO programming, at least we're tidy and symmetrical.

The inheritance scheme does pay off again when we want to play a macro. It's conceptually simple to have a list of commands, and to play each one in sequence. We compile a list of GridProcess items (our base class), and dispense with each one in a loop.

The macro takes a generic list of type GridProcess (the base class)

List<GridProcess> processList;
...
processList = Util.RetrieveMacroProcesses(macro.ID, mGridColumns);
for (int i = 0; i < processList.Count; i++)
{
    ... execute each class of rule
}

Again, we have to cast each item to the proper type and then call the correct functions, but it's conceptually simple and it's orderly.

The Database

The database is an Access file, though it could be any structured database at all. I chose Access because the driver is readily available, and also because you can "cheat" by firing up Access itself if you need to. If you think your users may figure out how to do this as well, you can rename the file and extension, and supply a custom setup object. (The explanation is in the Quick Start section, below.)

Data retrieval is set up through OLEDB, so it would be extremely easy to port to another DBMS. Wherever possible I use the "official" combination of DataAdapters and DataSets, even though they have a mind of their own at times.

This is how the database looks:

Screenshot - GridProcessorERD.gif

Basically, the GridProcess table stores any type of process for transforming data. The Macro table stores a sequence of processes.

The main database issues were the usual .NET ones:

Telling the application where the data file is. The wizards like to hard-code locations. This application has a "Settings" class, which calls a function in the Utils class to find the main application directory. The function strips off the "bin\debug" stuff, so that it will look in the same place both for development and production.
Saving strings directly with SQL commands. (When it makes more sense than using a DataAdapter.) Common characters such as apostrophes can make a database throw a wobbly, and this varies between DBMSes. Even though it's a pain, it's best to use parameter objects, and let the driver-writers worry about all the things that could go wrong.
Resources. At times, The .NET wizards store their select, update and delete statements as resources inside a form. This is a major headache when changing data structures, and in any case, seems like an odd thing to do. Hopefully, I've included absolutely everything, and no further changes are necessary.
Getting an ID number from a sequence. (Not a .NET issue, but a general one.) The app re-selects the item it inserted to get the new ID.

Quick start

The component is meant to be pluggable. It automatically looks for a file called GridProcessor.mdb in the same folder as your application, and is clever enough to find it from your bin/debug directory during development.

The best way to use it is to declare a form-wide variable, set to null:

GridProcessor.FormGridProcessor mGridProcessor = null;

and then to initialize it only if that hasn't been done before. That way it remembers where it was every time you Show() it.

if (mGridProcessor == null)
    mGridProcessor = new GridProcessor.FormGridProcessor(dataGridView_myGrid);
mGridProcessor.Show(this);

If you have another location, another filename, or another connection string, you can pass a GridProcessor.Settings object, using an alternate constructor.

GridProcessor.Settings mySettings = new GridProcessor.Settings();
mySettings.DataDirectory = @"D:\MyDirectory";
mySettings.FileName = "OtherFile.abc"; // non-Access, or extension disguised
mySettings.ConnectionString = "Fripping lyshus wimbgunts, awhilst moongrovenly 
    kormzibs. Bleem miserable venchit! Bleem forever mestinglish asunder 
    frapt!";
...
if (mGridProcessor == null)
    mGridProcessor = new GridProcessor.FormGridProcessor(dataGridView_myGrid, 
    mySettings);

Using a custom setting, you can have multiple files for different grids in the same application, a single file for an application, or, if you like, a central file for everything (though there's no code for record-locking.)

The grid processor form also has a Grid property, which is the reference that the component uses. There's a fair bit of danger involved in exposing this, but it could be helpful in MDI applications. If you have a series of windows with the exact same grid, you can set the GridProcessor to operate on the topmost window.

Further Work

There are a lot more that can be done with such a component. Fields could be concatenated, or used in decision logic. There could be a settings file for the screen as a whole, with default columns, rows, and the preferred tab. There could be security settings to prevent tampering with saved items. There could be a library of functions which you could embed inside a replacement expression. The component could deal with numbers. And most important, there are the things I haven't thought of, that are perfectly obvious to other people who use it.

There's no reason any of this couldn't be done, excepting time, of course, and loads of work. The GridProcessor is a project on SourceForge, which you're welcome to join. It's here.

Conclusion

This component was a typical answer to a time-consuming situation — to step back and let the computer do the work. Of course, that took a bit of time in itself (call it "administrative overhead"), but the programming paid for itself during the first day of use, and saved some horrifically tedious typing. More than that, the code was immediately useful in several other projects, and took only a few lines to set up. The big job, as always, was whip the code into shape for public consumption. That said, I do hope that other people find this component helpful. (If not, I'll gladly refund the full price they paid for it!)

Please let me know what you think. I'd be interested in knowing where the component ended up, what sort of work it's doing, and which operations and expressions turned out to be the most useful. And if you like it, don't forget to vote.

History

03 May 2007 - First submission.