I have a lot of applications which use the DataGridView
DataGridView is a real work horse, though sometimes, the data it holds
can be a bit worrying! In particular, I get a lot of text from FreeDB,
where volunteers type in their CD titles. Each data enterer has their
own unique "style". Some people are fussy and
consistent, where others splatter the walls with happy typing.
After spending way too much time on corrections, I started to
the grid were more like a word processor, where you can do global
replacements. This component is an answer to that need -
something that plugs into the grid, and allows an end-user to whip data
into shape without having to do the whole thing by hand.
The GridProcessor component is a window, which works like the
and Replace window in a word processor. It allows you to keep it open
while you work on the data underneath (and like the real thing, it sits
top of the item that it's found!)
It has four tabs for editorial and data cleaning functions,
plus a fifth tab for macros.
|Find ||Finds a string, and
moves to the cell
that has that string.|
Search and replace
Substitutes text. You
can name, save
and load specific expressions.|
|All the normal stuff,
possibility of unusual things as well. Like the search and replace
commands, you can store these items.
Division of strings,
pieces to the column(s) of your choice
Cleaning data always
operations. To take away drudgery, macros play sets of operations in
For both philosophical and practical reasons, the testing box
is a constant. It knows which tab is visible, and conducts the
appropriate test on the string you provide.
Casing, Replacing and Regex
Most of the work revolves around replacement. Replacements can
substitute one chunk of text for another (an eye for a tooth), or they
can use regular expressions. Regular expressions let you to treat text
abstractly, so they're extremely useful. They're even more useful for
scaring people, and regular expressions can also be hard to work out
again if you forget them. Therefore, the GridProcessor lets you save
and recall expressions, and to give them names. This also makes it
easier to give the component to non-technical end-users.
I was particularly pleased with the idea of using Regex for
casing, though I'm sure I'm not the first person to think of that.
Regular expressions don't have a provision for changing case, but we
only need Regex to identify
situations where we want to change case.
After that, we can use regular string functions to do the work of
uppering and lowering. The Regex Match object tells us both the start
index and the length of the string we want to change, so it's easy to
set up once we get there.
The benefit of the approach is that not only can we do upper,
lower, and proper case, but we can also do some more curvy things. For
instance, in Spanish only the first word of a title is capitalized, and
everything else is lower case. We can make a rule called "Spanish title
case". French people like to
uppercase their surnames — «Like
could devise a "shout-your-SURNAME case" if we wanted to. Rather
than catalogue all the things that people are likely to do, why not
just give them the ability to do it when they think of it?
Grids and Documents
While there are similarities
between grids and documents (hence the "GridProcessor" idea), the
grid is a structured container, where a document is not. In a way, the
grid is a bunch of tiny
documents, each of which has the "subject" of its column header.
What's significant for us is that we can choose which columns get
updated, and we can leave the rest alone. In a word processing
document, we either fire blindly or confirm each possible change.
Here, we can fine-tune the changes, and thanks to the structure of the
grid, even a global replacement can be more precise.
Because each cell is a
small document, our expression logic may be slightly
changes are applied at the cell level, so if there are two instances
of the word "the", for instance, both are changed, unless we
specify an anchoring position. I had thought about allowing the user
to confirm each change within a cell, but it wouldn't do them any
favors when used from a macro. It would either plow through against
their wishes, or leave unfinished business without telling anyone.
It's best to get the expression right, so that it works the same
By the way, some of the
changes could be
surprising, so the component has a confirmation screen which shows the
"before and after", with differences highlighted in blue.
Currently, it only does the color-coding for literal replacements,
leaving the user to work out what's changed when regular
involved. Perhaps someone would like to submit an improved version
with a good, color coded
On initialization, the
receives a grid reference as a parameter. Using the reference, it
reads metadata from the grid, and compiles a list of columns that it
can work with. At this point the component ignores everything except
text (no numbers), and skips columns that are invisible. There's an
array of "allowable" column classes, which can be
added to if
you've devised your own type of column.
The component keeps a live
the grid, and hence to all of the grid's data. Using the grid
window can apply changes interactively, and can also control the
location of the current cell.
At the base is a
GridProcess, which holds
instructions and working
variables for transforming grid text. Search and replace
casing operations are both replacements, and they're
represented by a
Replacement class, which
GridProcess and adds a
Split also inherits from
though its concerns are with columns. It retrieves
the relatively-stable column headings from the database, turning them
into more volatile column indices. Since it's possible for a split
result to go into
multiple columns, the result assignment works like a
an array of column indices for each member of the string that's
split. Below is a partial class diagram, made with Sacha Barber's
wonderful AutoDiagrammer tool.
There's a bit of
overkill here, especially for a small component. The main advantage
of the inheritance is that all of our objects can share the same
data, which is strikingly similar. However, the functionality is
quite different. One item evaluates an expression so we can jump to a
cell. Another item transforms text. Another breaks up text and
redistributes it. All these classes rely on external data, which is
either in the grid, or in a text box (as part of a test). There are
different needs, and each class has a very different factory
It would be nice
for each object to have a
which we could call in
sequence, but that would involve a certain amount of violence. It's
more natural to use the objects as data structures, and to program in a
way that suits the grid we're working against. Each operation has a
row function, and a simple, inner function which we can independently
from the test harness box. While not exactly at the forefront of OO
programming, at least we're tidy and symmetrical.
The inheritance scheme does pay off again when we want to play
It's conceptually simple to have a list of commands, and to play each
in sequence. We compile a list of GridProcess items (our base class),
and dispense with each one in a loop.
The macro takes a
generic list of type GridProcess (the base class)
processList = Util.RetrieveMacroProcesses(macro.ID, mGridColumns);
for (int i = 0; i < processList.Count; i++)
... execute each class of rule
Again, we have to cast each item to the proper type and then
call the correct functions, but it's conceptually simple and it's
The database is an
Access file, though it could be any structured database at all. I chose
Access because the driver is readily available, and also because you
can "cheat" by firing up Access itself if you need to. If you think
your users may figure out how to do this as well, you can rename the file and
extension, and supply a custom setup object. (The explanation is in the
Quick Start section, below.)
retrieval is set up through OLEDB, so it would be extremely easy to
port to another DBMS. Wherever possible I use the "official" combination of
even though they have a mind of their own at times.
This is how the database looks:
GridProcess table stores any type of process for transforming data.
The Macro table stores a sequence of processes.
The main database
issues were the usual .NET ones:
where the data file is. The wizards like to hard-code
application has a "Settings" class, which calls a function in the Utils
class to find the main application directory. The function strips off
the "bin\debug" stuff, so that it will look in the
same place both for development and production.
with SQL commands. (When it makes more sense than using a
Common characters such as
can make a
database throw a wobbly, and this varies between DBMSes. Even though
it's a pain, it's best to use parameter objects, and let
driver-writers worry about all the things that could go wrong.
At times, The .NET
wizards store their
statements as resources inside a form.
This is a major headache when changing data structures, and in any
case, seems like an odd
thing to do. Hopefully, I've included absolutely
everything, and no further changes are necessary.
- Getting an ID
number from a sequence. (Not a .NET issue, but a general
one.) The app
re-selects the item it inserted to get the new ID.
The component is
meant to be pluggable. It automatically looks for a file called
GridProcessor.mdb in the same folder as your application, and is
clever enough to find it from your bin/debug directory during
The best way to
use it is to declare a form-wide variable, set to null:
GridProcessor.FormGridProcessor mGridProcessor = null;
and then to
initialize it only if that hasn't been done before. That way it
remembers where it was every time you
if (mGridProcessor == null)
mGridProcessor = new GridProcessor.FormGridProcessor(dataGridView_myGrid);
If you have another location, another filename,
or another connection string, you can pass a
GridProcessor.Settings object, using an
GridProcessor.Settings mySettings = new GridProcessor.Settings();
mySettings.DataDirectory = @"D:\MyDirectory";
mySettings.FileName = "OtherFile.abc";
mySettings.ConnectionString = "Fripping lyshus wimbgunts, awhilst moongrovenly
kormzibs. Bleem miserable venchit! Bleem forever mestinglish asunder
if (mGridProcessor == null)
mGridProcessor = new GridProcessor.FormGridProcessor(dataGridView_myGrid,
Using a custom setting, you can have multiple files
in the same application, a single file for an application, or, if you
like, a central file for everything (though there's no code for
The grid processor form also has a
property, which is the reference that the component uses.
There's a fair bit of danger involved in
exposing this, but it could be helpful in MDI applications. If you have
series of windows with the exact same grid, you can set the
GridProcessor to operate on the topmost window.
There are a lot
more that can be done with such a component. Fields could be
concatenated, or used in decision logic. There could be a settings
file for the screen as a whole, with default columns, rows, and the
preferred tab. There could be security settings to prevent tampering
with saved items. There could be a library of functions which you
could embed inside a replacement expression. The component could deal
with numbers. And most important, there are the things I
haven't thought of, that are perfectly obvious to other people who use
There's no reason any of this couldn't be done, excepting
course, and loads of work. The GridProcessor is a project on
SourceForge, which you're welcome to join. It's
This component was a typical answer to a time-consuming
situation — to step back and let the computer do the work. Of
course, that took a bit of time in itself (call it "administrative
overhead"), but the programming paid for itself during the first day of
use, and saved some horrifically tedious typing. More than that, the
code was immediately useful in several other projects, and took only a
few lines to set up. The big
job, as always, was whip the code into shape for public consumption.
That said, I do hope that other people find this component helpful. (If
not, I'll gladly refund the full price
they paid for it!)
Please let me know what you think. I'd be
interested in knowing where the component ended up, what sort of work
it's doing, and which operations and expressions turned out to be the
most useful. And if you like it, don't forget to vote.
03 May 2007 - First submission.