Click here to Skip to main content
Click here to Skip to main content

Unravelling the Data – Ill-Formatted Data

, 10 Oct 2010
Rate this:
Please Sign up or sign in to vote.
A project post-mortem: Making sense out of a bunch of bad data

Read the background to this post.

When Bad Data Is Required

Fixing the data in the legacy system was not something that could be done in place. What I would refer to as ‘bad’ data was in some cases the glue that held reports together and made things like billing work.

This was one of the first things I had to resolve. My original approach was that I was going to try to “self-heal” the data through a combination of regular expressions, string replacements and templated hints and helpers. With the sheer number of discrepancies, this approach was DOA, and manual intervention was required.

A Side Order of Data Fixin’

I took a snapshot of the database and added additional columns to the tables where combined data was present. To understand ‘combined data’, a little background will help.

image

At various points in the application lifecycle, the management had decided that they weren't going to use the fields for their original purpose and started using them for a new one. In other scenarios, they decided to use the fields in one context for some customers and in a different context for other customers.

Depending on the customer and how long it took employees to shake old habits, these fields were used in differing ways over extended periods of time. Furthermore, even if there was a clear drawing point, none of the records in the database have a last modified date or any kind of audit log that reveals when a customer record is modified (in a meaningful way).

Thus, my side order approach faced another problem: there was no clear cut of the data and the existing applications needed to keep running. A snapshot of data today wouldn't help in the transition 6 months down the road.

The Birth of the Transition Platform

The solution was to create an ASP.NET MVC application, hosted only on the intranet, that used much of my original approaches to identifying bad data, but left the “healing” to an end user.

Where possible, I used jQuery to look up context-based fixes through controller actions and mashed-up some save functionality by POSTing to the legacy ASP pages of the original application. Where it wasn't possible (where functionality would be affected by changes to data) I created proxy tables to house the ‘corrected’ version of the data and wrote some monitors to periodically check to make sure that data was up-to-date.

I grouped functionality of the fixes into distinct controllers. For instance, anything related to a billing address was in the BillingAddressController with actions to support the corrections required for errors related to that piece. The models focused on model-view versions of the “bad data” and I used repositories to not only connect to the legacy system, but also to maintain a worklog of outstanding and completed tasks.

This worked great, as I was also able to say, at any given point, where we were at percentage-wise for correcting any set of data.

This process continues on today, and time is devoted to cleaning data each week. All three of the legacy systems continue to get (mis)used, though accuracy has been greatly improved. As users became aware of expected formats, they also became more conscience of how they were entering the data into the older software.

This first win made the next steps more plausible.

Next up: Data that Could be Auto-Corrected

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

TheyCallMeMrJames
Software Developer (Senior)
Canada Canada
For almost 30 years I have been working with computers, learning a myriad of languages and participating in various computing environments.
 
Though I have been mentoring other developers for the last decade, I have recently found a strong interest in writing and am learning to translate the fun of in-person teaching to something I can get into article form.
 
I enjoy following technology trends, new gadgets and trying to guess where innovation will lead us next (I'm kinda holding out for a robot-served utopia, but willing to submit to our future robot leaders).
 
I am a guy who is passionate about my faith, my family and a cure for Juvenile Diabetes (my son lives with this disease).

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Mobile
Web01 | 2.8.140709.1 | Last Updated 11 Oct 2010
Article Copyright 2010 by TheyCallMeMrJames
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid