Click here to Skip to main content
Click here to Skip to main content

Data Extraction and Manipulation

, 24 Apr 2007
Rate this:
Please Sign up or sign in to vote.
Extracts data from database, manipulates extracted data, and then updates the database

Screenshot - pic1.jpg

Introduction

This article comes from a need I had to extract data from a database, manipulate the data in some way, and then update the database. I could have written something short and sweet to get the job done quickly, and in fact this application didn't take long to develop. But as I laid out the algorithm (big word to describe a 'process of logical flow') to solve a need for the project I am working on, I thought I would go ahead and develop the application to show others how to use some basic techniques.

Background

In my current project at work, I am part of a team that is developing a large company database. The database is not only a business database but also a GIS database. The problem is with a table that needs to be normalized. This table will need to have data split into two different tables plus a third table as a linking table because of the 'many to many' relationship that is currently in the table. Here is a sample of the table:

tblNode
SiteNm NodeID Device NodeNm SiteFuncID NodeKey Station KVolt
AUBREY 1848 CB_M20CS-3, DSC_M20-3, DSC_M20CS-3_BYPASS AUBREY 6803 ABR.138.0001 ABR 138
AUBREY 1849 LD_LD_C, CB_M20CS-3, DSC_M20CS-3_BYPASS AUBREY 6803 ABR.138.0002 ABR 138
AUBREY 1850 LD_LD_B, CB_M20CS-2, DSC_M20CS-2_BYPASS AUBREY 6803 ABR.138.2109 ABR 138
AUBREY 1851 LD_LD_A, CB_M20CS-1, DSC_M20CS-1_BYPASS AUBREY 6803 ABR.138.2110 ABR 138
AUBREY 1852 CB_M20CS-2, DSC_M20-2, DSC_M20CS-2_BYPASS AUBREY 6803 ABR.138.3819 ABR 138
AUBREY 1853 DSC_M10, DSC_M20-2, DSC_M20-3, DSC_M60 AUBREY 6803 ABR.138.3820 ABR 138
AUBREY 1854 CB_M20CS-1, DSC_M20CS-1_BYPASS, DSC_M30, DSC_M60 AUBREY 6803 ABR.138.3821 ABR 138
AUBREY 1855 DSC_M10, DSC_M70VW, LN_ABR_RECT AUBREY 6803 ABR.138.3830 ABR 138
AUBREY 1856 DSC_M30, DSC_M70VW, LN_ABR_KRGRV AUBREY 6803 ABR.138.3831 ABR 138

Take a look at how 'CB_M20CS-3' is part of two different records. Some quick information: we are dealing with devices (think of them as a place on a line) and nodes (think of these as the line). With that said, a node can have many different devices and devices can be on many different nodes. So let's get started.

Using the code

Main function of the Application

Our process (or algorithm) will be:

  1. Create a dataset with our three tables
  2. Retrieve the information from the 'device' column
  3. Extract out the different devices from that string
  4. Flow control: Is the first device located in the device table
    1. Yes: Update the linking table with the device ID and the new node ID
    2. No: Create a new record in the device table and then update the linking table
  5. When all data has been handled update the database.

Now the logic control is done in the method ExtractDevices(). Here we make the calls to extract out the string of devices that are returned in List devices object. This was step 2 and 3 of our process. We then continue with step 4 and then loop through until the data has been processed. Here is a view of the control method:

private void ExtractDevices()
{            
    for (int i = 0; i < this.DS.Tables[0].Rows.Count; i++)
    {
        DataRow dr = this.DS.Tables[0].Rows[i];
        GetSingleDevices((string)dr["Device"]);

        for (int j = 0; j < this.devices.Count; j++)
        {
          string SQLFilter = "Station = '" + dr["Station"].ToString().Trim() + 
                "' AND DeviceNm = '" + this.devices[j].ToString().Trim() + "'";
            DataRow[] drcol = this.DS.Tables[1].Select(SQLFilter);
            //Filtering the datatable to find if the device is present

            if (drcol.Length > 0)
            {
                UpdateNodeDeviceLink(Convert.ToInt32(dr["NodeID"]), 
                    Convert.ToInt32(drcol[0]["DeviceID"]));
            }
            else
            {
                NewDevice(j, dr);
            }
        }
    }
}

Database

The next two methods I am going to show invoke database support. In the DBLoadDataSet() everything is easy to see and easy to use. I grouped my connection, adapter, and command builder into one class. This also gives me the advantage of using them at anytime, anywhere. And then in the DBUpdateDataSet() I show the use of an object[] array that I use to pass multiple parameters with. Also, the use of command builder makes coding much easier and quicker.

public static void DBLoadDataSet(ref OleClass OleStuff, ref DataSet DS, 
    MySqlData SqlData, string TableName)
{
    OleStuff.OleConn = new OleDbConnection(SqlData.MyConnectionString);
    OleStuff.OleAdpt = new OleDbDataAdapter(SqlData.MySQLString, 
        OleStuff.OleConn);
    OleStuff.OleCommBld = new OleDbCommandBuilder(OleStuff.OleAdpt);
    OleStuff.OleAdpt.MissingSchemaAction = MissingSchemaAction.AddWithKey;
    OleStuff.OleAdpt.Fill(DS, TableName);
}

public static void DBUpdateDataSet(object[] DataObjects)
{
    OleClass OleStuff = (OleClass)DataObjects[0];
    DataTable DT = (DataTable)DataObjects[1];
    MySqlData SqlData = (MySqlData)DataObjects[2];

    OleStuff.OleAdpt = new 
        System.Data.OleDb.OleDbDataAdapter(SqlData.MySQLString, 
        OleStuff.OleConn);
    OleStuff.OleCommBld = new 
        System.Data.OleDb.OleDbCommandBuilder(OleStuff.OleAdpt);
    OleStuff.OleAdpt.InsertCommand = OleStuff.OleCommBld.GetInsertCommand();
    OleStuff.OleAdpt.InsertCommand.Connection = OleStuff.OleConn;

    OleStuff.OleAdpt.InsertCommand.Connection.Open();
    OleStuff.OleAdpt.Update(DT);
    OleStuff.OleAdpt.InsertCommand.Connection.Close();
}

Pretty simply stuff. So let's add to it a little to show how some different things work. I added a background thread that will continuously update a label on the form, so we can look at multi-threading. Also, I add delegates and events to update textboxes on the form while we are running through the algorithm. As for the database - the actual data is stored on an SQL server but for this article I have imported the data into Access. For the Ole code, I will be using a library that I wrote in a previous article (click here). I add two methods and a class to this library that was lacking in the first version.

Multi-thread

Here I have created a new thread that when started will run the RunAnime method in the background. The RunAnime method will fire the delegate which will use Form.Invoke to call the method lblAnimeAddString to change the label on the form. We use 'Invoke' because we are change the form from a thread that did not create the form.

public delegate void FormAnime(string Char);
public FormAnime FormAnimeDelegate;
private Thread AnimeThread;

private void SetUpExtractor()
{     
     …
     FormAnimeDelegate = new FormAnime(this.lblAnimeAddString); 
     AnimeThread = new Thread(new ThreadStart(RunAnime));
     this.Stop = false; 
     AnimeThread.Start();
}

private void lblAnimeAddString(string Char)
{
    lblAnime.Text = Char;
    lblAnime.Refresh();
}

More Delegates and Events

In the class Extraction I have created a delegate and event for each of the textboxes on the form, in which each textbox is created for the data we will be handling. Now the difference between these delegates and the one used in the multi-threading section, is that the multi-threaded delegate invoked (made something happen) on the form. In the Extraction delegates the form is listening for the events to be fired whereas before it was not listening. Here is what some of the code looks like.

public class Extraction
{       
#region Delegates, Events, and Members
    public delegate void DeviceIDEventHandler(int DeviceID);
    public delegate void DeviceNameEventHandler(string DeviceName);
    public event DeviceIDEventHandler DeviceIDEvent;
    public event DeviceNameEventHandler DeviceNameEvent;    
#endregion
}

private void SetUpExtractor()
{        
    Extractor = new Extraction();        
    Extractor.DeviceIDEvent += new Extraction.DeviceIDEventHandler(
        Extractor_DeviceIDEvent);
    Extractor.DeviceNameEvent += new Extraction.DeviceNameEventHandler(
        Extractor_DeviceNameEvent);
    …
}

private void NewDevice(int Indexer, DataRow dr)
{        
    DataRow dr2 = this.DS.Tables[1].NewRow();        
    dr2["DeviceNm"] = this.devices[Indexer];        
    this._Device.DeviceName = this.devices[Indexer];        
    DeviceNameEvent(this._Device.DeviceName);
}

File Structure

Let me explain how and why I built the form class using the main partial file, design partial file, and ExtractorSetup file. I did this to demonstrate some clarity in the code and create a way to group the related code into its own file. So what you end up with is one file that handles things like the button and textbox events (this is the main file where the class declaration and inheritance of Form happens). Then you will have another file that sets up the design and initializes all of the components. Lastly, there is the 'ExtractorSetup' file in which all of my methods, properties, and members are located. Also, I have a method, SetUpExtractor(), that is called directly after Initializecomponent() which initializes all of my components.

Points of Interest

Like I said earlier, I wanted to write an article that demonstrated some basic techniques. One thing you should notice is the use of this. I use it in my Extraction class code but I tried not to use it in other classes. The this keyword refers to anything that is part of the class or to anything that is global within that (this) class.

Something else that is worth noting in MyDatabaseLibraryClass is the use of MySqlDta and OleClass. By using these classes I am grouping related objects and a more logical use of coding your classes. I tried to incorporate a few other techniques that are not necessary but allow you to see you the work. For instance, using the keyword ref in some of the methods, this will take the object and change it directly inside the called method. I also show you the use of object[] param in a couple of methods. This was done to show how to pass multiple objects to a method without having to use multiple parameters in the method declaration.

Conclusion

Well I hope that I have shed some light on a few basic techniques. If nothing else, I hope this article gives other developers a quick reference to some things you know but may not remember because of lack of use. I know I frequently need to do some sort of process but just flat out forget how something is done.

Take some time to look through the code, if you find something that could have been not just different but better let me know. I have also included an Access DB with the tables that this application works with. There is not an installation/setup file - you can load the source or just run the application. Thank you for your time and God Bless.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

lost in transition
Web Developer
United States United States
I am a software, database, and gis developer. I love the challenge of learning new ways to code.

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Mobile
Web02 | 2.8.140709.1 | Last Updated 24 Apr 2007
Article Copyright 2007 by lost in transition
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid