The Grid Processor: Word Processing Abilities for the .NET DataGridView Component






4.87/5 (14 votes)
May 7, 2007
11 min read

66304

1568
A plugin which offers search and replace, casing and other capabilities for the Microsoft .NET DataGridView component
Introduction
I have a lot of applications which use the DataGridView component. The DataGridView is a real work horse, though sometimes, the data it holds can be a bit worrying! In particular, I get a lot of text from FreeDB, where volunteers type in their CD titles. Each data enterer has their own unique "style". Some people are fussy and consistent, where others splatter the walls with happy typing.
After spending way too much time on corrections, I started to wish that the grid were more like a word processor, where you can do global replacements. This component is an answer to that need - something that plugs into the grid, and allows an end-user to whip data into shape without having to do the whole thing by hand.
The GridProcessor component is a window, which works like the Search and Replace window in a word processor. It allows you to keep it open while you work on the data underneath (and like the real thing, it sits on top of the item that it's found!)
It has four tabs for editorial and data cleaning functions, plus a fifth tab for macros.
Find | Finds a string, and moves to the cell that has that string. |
Search and replace | Substitutes text. You can name, save and load specific expressions. |
Upper/lower case |
All the normal stuff, with the possibility of unusual things as well. Like the search and replace commands, you can store these items. |
Split | Division of strings, assigning the pieces to the column(s) of your choice |
Macros | Cleaning data always involves multiple operations. To take away drudgery, macros play sets of operations in a sequence. |
For both philosophical and practical reasons, the testing box is a constant. It knows which tab is visible, and conducts the appropriate test on the string you provide.
Casing, Replacing and Regex
Most of the work revolves around replacement. Replacements can substitute one chunk of text for another (an eye for a tooth), or they can use regular expressions. Regular expressions let you to treat text abstractly, so they're extremely useful. They're even more useful for scaring people, and regular expressions can also be hard to work out again if you forget them. Therefore, the GridProcessor lets you save and recall expressions, and to give them names. This also makes it easier to give the component to non-technical end-users.
I was particularly pleased with the idea of using Regex for casing, though I'm sure I'm not the first person to think of that. Regular expressions don't have a provision for changing case, but we only need Regex to identify situations where we want to change case. After that, we can use regular string functions to do the work of uppering and lowering. The Regex Match object tells us both the start index and the length of the string we want to change, so it's easy to set up once we get there.
The benefit of the approach is that not only can we do upper, lower, and proper case, but we can also do some more curvy things. For instance, in Spanish only the first word of a title is capitalized, and everything else is lower case. We can make a rule called "Spanish title case". French people like to uppercase their surnames — «Like THIS» — so we could devise a "shout-your-SURNAME case" if we wanted to. Rather than catalogue all the things that people are likely to do, why not just give them the ability to do it when they think of it?
Grids and Documents
While there are similarities between grids and documents (hence the "GridProcessor" idea), the grid is a structured container, where a document is not. In a way, the grid is a bunch of tiny documents, each of which has the "subject" of its column header. What's significant for us is that we can choose which columns get updated, and we can leave the rest alone. In a word processing document, we either fire blindly or confirm each possible change. Here, we can fine-tune the changes, and thanks to the structure of the grid, even a global replacement can be more precise.
Because each cell is a small document, our expression logic may be slightly different. Also, changes are applied at the cell level, so if there are two instances of the word "the", for instance, both are changed, unless we specify an anchoring position. I had thought about allowing the user to confirm each change within a cell, but it wouldn't do them any favors when used from a macro. It would either plow through against their wishes, or leave unfinished business without telling anyone. It's best to get the expression right, so that it works the same way everywhere.
By the way, some of the
changes could be
surprising, so the component has a confirmation screen which shows the
"before and after", with differences highlighted in blue.
Currently, it only does the color-coding for literal replacements,
leaving the user to work out what's changed when regular
expressions are
involved. Perhaps someone would like to submit an improved version
with a good, color coded Diff()
function.
The Specifics
On initialization, the GridProcessor receives a grid reference as a parameter. Using the reference, it reads metadata from the grid, and compiles a list of columns that it can work with. At this point the component ignores everything except text (no numbers), and skips columns that are invisible. There's an array of "allowable" column classes, which can be added to if you've devised your own type of column.
The component keeps a live reference to the grid, and hence to all of the grid's data. Using the grid reference, the window can apply changes interactively, and can also control the location of the current cell.
At the base is a
class called GridProcess
, which holds
instructions and working
variables for transforming grid text. Search and replace
and
casing operations are both replacements, and they're
represented by a Replacement
class, which
inherits GridProcess
and adds a ReplaceExpression
member. The Split
also inherits from GridProcess
,
though its concerns are with columns. It retrieves
the relatively-stable column headings from the database, turning them
into more volatile column indices. Since it's possible for a split
result to go into
multiple columns, the result assignment works like a HashTable
,
with
an array of column indices for each member of the string that's
split. Below is a partial class diagram, made with Sacha Barber's
wonderful AutoDiagrammer tool.
There's a bit of overkill here, especially for a small component. The main advantage of the inheritance is that all of our objects can share the same data, which is strikingly similar. However, the functionality is quite different. One item evaluates an expression so we can jump to a cell. Another item transforms text. Another breaks up text and redistributes it. All these classes rely on external data, which is either in the grid, or in a text box (as part of a test). There are different needs, and each class has a very different factory function.
It would be nice
for each object to have a Process()
function
which we could call in
sequence, but that would involve a certain amount of violence. It's
more natural to use the objects as data structures, and to program in a
way that suits the grid we're working against. Each operation has a
row function, and a simple, inner function which we can independently
from the test harness box. While not exactly at the forefront of OO
programming, at least we're tidy and symmetrical.
The inheritance scheme does pay off again when we want to play a macro. It's conceptually simple to have a list of commands, and to play each one in sequence. We compile a list of GridProcess items (our base class), and dispense with each one in a loop.
The macro takes a generic list of type GridProcess (the base class)
List<GridProcess> processList;
...
processList = Util.RetrieveMacroProcesses(macro.ID, mGridColumns);
for (int i = 0; i < processList.Count; i++)
{
... execute each class of rule
}
Again, we have to cast each item to the proper type and then call the correct functions, but it's conceptually simple and it's orderly.
The Database
The database is an Access file, though it could be any structured database at all. I chose Access because the driver is readily available, and also because you can "cheat" by firing up Access itself if you need to. If you think your users may figure out how to do this as well, you can rename the file and extension, and supply a custom setup object. (The explanation is in the Quick Start section, below.)
Data
retrieval is set up through OLEDB, so it would be extremely easy to
port to another DBMS. Wherever possible I use the "official" combination of
DataAdapter
s and DataSet
s,
even though they have a mind of their own at times.
This is how the database looks:
Basically, the GridProcess table stores any type of process for transforming data. The Macro table stores a sequence of processes.
The main database issues were the usual .NET ones:
-
Telling the application where the data file is. The wizards like to hard-code locations. This application has a "Settings" class, which calls a function in the Utils class to find the main application directory. The function strips off the "bin\debug" stuff, so that it will look in the same place both for development and production.
-
Saving strings directly with SQL commands. (When it makes more sense than using a
DataAdapter
.) Common characters such as apostrophes can make a database throw a wobbly, and this varies between DBMSes. Even though it's a pain, it's best to use parameter objects, and let the driver-writers worry about all the things that could go wrong. -
Resources. At times, The .NET wizards store their
select
,update
anddelete
statements as resources inside a form. This is a major headache when changing data structures, and in any case, seems like an odd thing to do. Hopefully, I've included absolutely everything, and no further changes are necessary. - Getting an ID number from a sequence. (Not a .NET issue, but a general one.) The app re-selects the item it inserted to get the new ID.
Quick start
The component is meant to be pluggable. It automatically looks for a file called GridProcessor.mdb in the same folder as your application, and is clever enough to find it from your bin/debug directory during development.
The best way to use it is to declare a form-wide variable, set to null:
GridProcessor.FormGridProcessor mGridProcessor = null;
and then to
initialize it only if that hasn't been done before. That way it
remembers where it was every time you Show()
it.
if (mGridProcessor == null)
mGridProcessor = new GridProcessor.FormGridProcessor(dataGridView_myGrid);
mGridProcessor.Show(this);
If you have another location, another filename,
or another connection string, you can pass a
GridProcessor.Settings
object, using an
alternate constructor.
GridProcessor.Settings mySettings = new GridProcessor.Settings();
mySettings.DataDirectory = @"D:\MyDirectory";
mySettings.FileName = "OtherFile.abc"; // non-Access, or extension disguised
mySettings.ConnectionString = "Fripping lyshus wimbgunts, awhilst moongrovenly
kormzibs. Bleem miserable venchit! Bleem forever mestinglish asunder
frapt!";
...
if (mGridProcessor == null)
mGridProcessor = new GridProcessor.FormGridProcessor(dataGridView_myGrid,
mySettings);
Using a custom setting, you can have multiple files for different grids in the same application, a single file for an application, or, if you like, a central file for everything (though there's no code for record-locking.)
The grid processor form also has a Grid
property, which is the reference that the component uses.
There's a fair bit of danger involved in
exposing this, but it could be helpful in MDI applications. If you have
a
series of windows with the exact same grid, you can set the
GridProcessor to operate on the topmost window.
Further Work
There are a lot more that can be done with such a component. Fields could be concatenated, or used in decision logic. There could be a settings file for the screen as a whole, with default columns, rows, and the preferred tab. There could be security settings to prevent tampering with saved items. There could be a library of functions which you could embed inside a replacement expression. The component could deal with numbers. And most important, there are the things I haven't thought of, that are perfectly obvious to other people who use it.
There's no reason any of this couldn't be done, excepting time, of course, and loads of work. The GridProcessor is a project on SourceForge, which you're welcome to join. It's here.
Conclusion
This component was a typical answer to a time-consuming situation — to step back and let the computer do the work. Of course, that took a bit of time in itself (call it "administrative overhead"), but the programming paid for itself during the first day of use, and saved some horrifically tedious typing. More than that, the code was immediately useful in several other projects, and took only a few lines to set up. The big job, as always, was whip the code into shape for public consumption. That said, I do hope that other people find this component helpful. (If not, I'll gladly refund the full price they paid for it!)
Please let me know what you think. I'd be interested in knowing where the component ended up, what sort of work it's doing, and which operations and expressions turned out to be the most useful. And if you like it, don't forget to vote.
History
03 May 2007 - First submission.