Click here to Skip to main content
15,891,951 members
Articles / Database Development / SQL Server
Article

Data Generator for SQL SERVER 2000,2005 and MSDE

Rate me:
Please Sign up or sign in to vote.
1.00/5 (10 votes)
21 Aug 200612 min read 88K   970   41   20
The tool can generate the specified number of rows for any table. Can be used for stress testing the application

Introduction

Generating a huge volume of data essentially is generating required number of insert statements for various tables taking into consideration that some columns are foreign keys and the values to be inserted into these columns are to be picked from the parent table. Keeping these things in mind this tool “RDGen” was developed. The following steps describe the logic used to populate the table with data.

  1. For the selected table, find all the columns in the table.
  2. For every column find its Data type and check if the column is a foreign key.
  3. If it is a foreign key, get a value from the parent table.
  4. If it is not a foreign key, generate a random value depending on the data type.
  5. Generate an Insert statement with the various values thus generated for the selected table.
  6. Repeat the steps 3, 4, 5 required numbers of times.

Installing and Compiling the Sample Code

Please find the source code attached along with this article. Change the value of the static variable ConnectionString in the file DatabaseCommunicationMethods.cs.

Defining the Problem

We need to get the list of tables from the database and display it to the user. The user should be able to select a table of his choice and generate the data for the table. For columns that are not foreign keys, we need to generate data randomly and insert into the table. Where are we going to get the data from? Even if it is a foreign key, how we are going to identify the parent table and pick a value randomly from it? Let us see as we proceed further.

Defining the Solution

Getting the required Metadata

1. List of tables can be obtained from the view INFORMATION_SCHEMA.TABLES. It lists all the tables and views in the database. Hence you need to add a filter condition TABLE_TYPE =’BASE TABLE’.

2. To get the list of columns and their data type and if the column is a foreign key the following query can be used.

SQL
SELECT 
COLUMN_NAME,
DATA_TYPE,
isnull(CHARACTER_MAXIMUM_LENGTH,0) AS 'LENGTH',
IS_NULLABLE,
ISNULL((SELECT 'Y' FROM SYSFOREIGNKEYS WHERE 
        FKEYID =ID AND FKEY=COLID),'N') as 'IsForeignKey',
Ordinal_Position,
SYSCOLUMNS.IsComputed,
IsIdentity,
IsRowGuidCol

FROM 
SYSCOLUMNS,
(SELECT
COLUMN_NAME,
IS_NULLABLE,
DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH,
Ordinal_Position,
COLUMNPROPERTY(OBJECT_ID('#TABLE_NAME#'), 
               INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME, 
               'IsIdentity') AS IsIdentity,
COLUMNPROPERTY(OBJECT_ID('#TABLE_NAME#'), 
               INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME, 
               'IsRowGuidCol') AS IsRowGuidCol

FROM 
INFORMATION_SCHEMA.COLUMNS

WHERE
TABLE_NAME ='#TABLE_NAME#') AS A

WHERE 
ID 
IN
(SELECT ID FROM SYSOBJECTS WHERE TYPE='U' AND NAME ='#TABLE_NAME#') 
AND
A.COLUMN_NAME =NAME

Order By
Ordinal_Position

Replace the TABLE_NAME with the name of the selected table. SYSBOJECTS contain information regarding all the objects. SYSCOLUMNS contains information regarding all the columns. SYSFOREIGNKEYS contain the information regarding the foreignkeys.The query retrieves the id of the table from SYSOBJECTS and uses that to get the information from SYSCOLUMNS. Additionally we also fetch if the column is an identity key column, or if it is a GUID or a computed column.

We also need to know for a foreign key column , which is the parent table, what is the refernced column. This is where the system stored procedure “SP_FKEYS” becomes very useful. We can pass

Name of the selected table as a parameter(@FKTABLE_NAME) and get the list of all the foreign keys in it. We only need the following columns in the results returned by the stored procedure. They are

1. PKTABLE_NAME(Parent Table)

2. PKCOLUMN_NAME(Referenced Column)

3. FKCOLUMN_NAME(Foreign Key column in the selected table).

Populating the Foreign Key Column
Using the PKTABLE_NAME and PKCOLUMN_NAME we can construct the following SQL Statement

SELECT PKCOLUMN_NAME FROM PKTABLE_NAME

to get a value for the foreign key column FKCOLUMN_NAME in the selected table. However we need to get only one value. So the above query need to be modified as follows.

SELECT TOP 1 PKCOLUMN_NAME FROM PKTABLE_NAME

To select only one value. It would be better to fetch a random value from the parent table. To fetch a record randomly from any table I use the NewID() T-SQL function . Hence the above query can be modified as

SELECT TOP 1 PKCOLUMN_NAME FROM PKTABLE_NAME ORDER BY NEWID().

Populating the other columns

Now we know how to get the metadata and how to populate the values for foreign key columns. However how we are going to populate the other columns in the table?. Where are we going to get the values from? This is where the Random() class in the .NET framework comes in handy. For example

If we have a column of type datetime, this is how we can generate values for that column.

Have a list that contains the days from 1-28. Have a list that contains the 12 months.. Have a list of years that you need. Randomly pick a day , pick a month and pick a year. And combine in the format dd-MMM-YYY. (Alternately you can generate them in the DD-MM-YYYY format also. It is a matter of choice.) Similarly I have a list of characters both lower case and uppercase. I randomly pick various characters from the list depending on the column length and append them to generate a random value.

I found this technique to be suitable for all data types and easier to implement in a program too. The tool currently supports many data types leaving quite a few though. Now that the logic has been described in detail let us go ahead and find out how this has been implemented in the tool.

The Code

The above picture shows various classes in the RDGen tool. The column() class complements a column in a table. Table() class complements a table in a database. It has a list of columns. Foreign Key class complements a foreign key in the database. InsertStatement() class complements an INSERT statement. RandomGenerator() is the class which can generate a random character, integer,day,month,year or tinyint. DataBuilder() users the RandomGenerator() to build values for each column depending on the data type. DCM() class handles communication with the database server.( DCM stands for database communication methods.)

Let us discuss in detail various methods in each class and how they operate together to generate huge volume of data. Get ready to have a code walk through time to open the code and proceed along.

The column() class has the following properties

Name,

Data Type,

Length,

Nullable(to indicate if null values are allowed),

ForeignKey(To indicate if it is a foreignkey)

IsIdentity,

IsGuid,

ParentKeyColumn(Only for foreign keys this property would return a value),

ParentKeyTable(Only for foreignkey),

DataValue (To hold the generated value).

ForeignKey() Class contains the following properties:

FKColumnName(A column that is a foreign key) ,

PKParentTable(The Parent Table)

PKColumnName(Column in the Parent Table that is referenced).

The insertstaatement() class just complements the insert statement in SQL.

An Insert statement in SQL is of the following format.

INSERT INTO TABLE (COL1, COL2, COL3) VALUES (VALUE1, VALUE2, VALUE3).

It has the following members.:

BeginStatement that contains the value “Insert Into”.

_TableName for the name of the selected table.

OpenPara and Close Para for “(“ and “)” respectively.

MiddleStatement that contains “Values”.

ColumnNames to contain the names of columns separated by commas.

ColumnValues to contain the values separated by comma.

ToString() has been overridden to return the formatted INSERT statement.

RDGenTable() complements the table in a database. It has a list of “Column” class to complement the columns it has. It has an InsertStatement() class as a member to generate the required Insert Statement.It has a method PopulateColumnNames() .That will generate a string containing name of columns separated by commas. It also has a method called PopulateData() which populates the data for all the columns. PopulateData() in turn uses the other methods GetForeignKeyValue() that will fetch a value from the parent table and GetRandomValue() that will return a Randomly generated value depending on the datatype of the column. Identity columns should not be included in a Insert statement,same with computed columns also. The tool currently does not support GUIDs currently. The IncludeColumn() returns a Boolean value that decides if a column in a table will be included in the insert statement getting generated by the tool.

The Class DCM() (DatabaseCommunication.cs) handles the communication with the database. It has the method GetTableList() which returns all the tables in the database. GetColumnList() which returns the <st1:place w:st="on">Meta data regarding columns for the selected table. GetForeignKeyValue() which returns a value from the parent table for the particular foreign key.

The Classes RandomGenerator() and DataBuilder() are responsible for populating the values for columns that are not foreignkeys. RandomGenerator() generates a character at a time for various data types. DataBuilder() uses RandomGenerator() to generate as many characters as required.

The class RandomGenerator() has two HashTable :

MonthSet - one that contains all months

characterset – one contains all characters from A-Z and a-z.

RandomGenerator() has the following methods

1.GetDay()- that returns a value between 1 and 28.

2. GetMonth()- that returns a random value from MonthSet

3.GetYear()- that returns the current year.

4.GetChar- that returns a random value from characterset

5.GetInt()- that returns a number between 1 and 9.

6.GetTinyInt()- that returns a number between 1 and 255.

The class DataBuilder() users RandomGenerator() to generate values for various columns in a table.

It has the following methods.

1. GetBinary()

It does not return a binaryvalue.But instead generates a character value. It users the cast function in the insert statement to generate a binary value.

2. GetBit()

returns 0 or 1 randomly

3. GetBigInt()

Returns a value for Columns of type BigInt

4. GetChar()
Returns a string depending the column data type and length.

5. GetDateTime()

Returns a string to populate the column of data type datetime.

6. GetInt()

Returns a string to populate the column of data type int.

7. GetSmallInt()

Returns a string to populate the column of data type smallint.

8. GetText()

Returns a string of length 500 characters to populate columns of data type text and ntext.

If you see all functions a return a string. All these are concatenated to get the Insert Statement in the correct format as we will see later.

The resource files TableList.txt and ColumnList.txt contain the queries for getting the list of tables and getting the meta data regarding the columns of the selected table. As the SQL Statement is longer , It is better to put them in a resource file and read it in the application rather than hard coding it in the application.

Now that we have seen the description of the various classes in the tool, it is time for us to go ahead and find out how all these classes interact to generate the huge volume of records for the selected table.

Stress Data Generator tool has a simple UI. cbTRableList- A Combo box for displaying the list of tables, txtNoOfRows-A text box for entering the required number of records, txtOutput-A text box for showing the results of the execution, ColumnsGrid-A Grid for displaying the meta data, cmdGenerate-A button for triggering the data generation process and to display the metadata in the grid.

When the form gets loaded, The Combo box is filled with the names of the tables in the database to which the application is connected. This is achieved through GetTableList() method of DCM() class. The method returns a datatable and the datatable is set as the datasource for the combo box cbTableList.

Follow the code in the cmdGenerate button click event.

C#
RDGenTable RDGenSelectedTable = new RDGenTable(cbTableList.Text);
ColumnsGrid.DataSource = RDGenSelectedTable.TableColumns;
StreamWriter sr = new StreamWriter(cbTableList.Text + ".sql");

for (int i = 1; i <= iRequiredRows; i++)
{
    sr.WriteLine( RDGenSelectedTable.GetInsertStatement());
    sr.WriteLine("");
}

txtOutput.Text = " Execute the contents of " + cbTableList.Text + 
                 ".sql" + " in Query Analyzer";
sr.Close();

A new instance of the RDGenTable() gets created by passing the name of the selected table as a parameter to the constructor. Now in the constrctor of RDGenTable(), We have the following code.

C#
public RDGenTable(string TableName)
{
    this.Name = TableName;
    this.TableColumns = DCM.GetColumnList(this.Name);
    _InsertStatement.TableName = this.Name;
    PopulateColumnNames();
}

The static method of the DCM.GetColumnList() gets called by passing the selected table as the parameter name. Now the TableColumns which is a list of Class Column will be populated with the meta data for all the columns in the table.

Now the let us dive into GetColumnList method.

C#
List<ForeignKey> RDGenForeignKeys = new List<ForeignKey>();
SqlConnection RDGenConn = new SqlConnection(ConnectionString);
SqlCommand RDGenCommand = new SqlCommand("sp_fkeys", RDGenConn);
RDGenCommand.Parameters.AddWithValue("@fktable_name", TableName);
RDGenConn.Open();
RDGenCommand.CommandType = CommandType.StoredProcedure;
SqlDataReader RDGenForeignKeyReader = RDGenCommand.ExecuteReader();

foreach(System.Data.Common.DbDataRecord ForeignKeyRecord 
                                        in RDGenForeignKeyReader)
{
    ForeignKey RDGenForeignKey = new ForeignKey();
    RDGenForeignKey.PKColumnName = (string)ForeignKeyRecord["PKCOLUMN_NAME"];
    RDGenForeignKey.PKTableName = (string)ForeignKeyRecord["PKTABLE_NAME"];
    RDGenForeignKey.FKColumnName = (string)ForeignKeyRecord["FKCOLUMN_NAME"];
    RDGenForeignKeys.Add(RDGenForeignKey);
}

Here we first get the necessary foreign key information for the selected table by executing the stored procedure sp_fkeys with the parameter @fktable_name. The selected table name is passed as the parameter. RDGenForeignKeys list is populated with the parent table name, referenced column and foreign key column name in the selected table for every foreign key the table has.

Then the metadata regarding all the columns are obtained by executing the SQL in ColumnList.txt resource file associated with the application. Here the string #TABLE_NAME# is replaced with the selected table name.

C#
foreach(System.Data.Common.DbDataRecord ColumnRow in RDGenReader)
{
    Column TableColumn = new Column(); 
    TableColumn.Name = (string)ColumnRow[0];
    TableColumn.DataType = (string)ColumnRow[1];
    TableColumn.Length = (int)ColumnRow[2];
    TableColumn.Nullable = ((string)ColumnRow[3] == "Y" ? true : false);
    TableColumn.ForeignKey = ((string)ColumnRow[4] == "Y" ? true : false);

    /* if the column is a foreign key find the parent key and table */
    if (TableColumn.ForeignKey)
    {
        foreach (ForeignKey RDGenTableForeignKey in RDGenForeignKeys)
        {
            if (RDGenTableForeignKey.FKColumnName == TableColumn.Name)
            {
                TableColumn.ParentKeyColumn = 
                            RDGenTableForeignKey.PKColumnName;
                TableColumn.ParentKeyTable = RDGenTableForeignKey.PKTableName;
            }
        }
     }

     TableColumn.OrdinalPosition = (int)ColumnRow[5];
     TableColumn.IsComputed = ((int)ColumnRow[6] == 1? true : false);
     TableColumn.IsIdentity = ((int)ColumnRow[7] == 1 ? true : false);
     TableColumn.IsGuid = ((int)ColumnRow[8] == 1 ? true : false);
     RDGenColumns.Add(TableColumn);
}

We read the meta data from the database for the selected table and populate the RDGenColumns which is a list of the class Column. For every column that is a foreignkey we also populate ParentKeyColumn,ParentKeyTable by using RDGenForeignKeys list.

Now let us go back to constructor RDGenTable from where the method gets called. These are following lines that are to be executed after GetColumnList() gets called.

C#
_InsertStatement.TableName = this.Name;
PopulateColumnNames();

TableName of the instance _InsertStatement is set to the selected table.

After that PopulateColumnNames() method gets called. In this method the ColumnsNames of the _InsertStatement is populated with comma separated values of the column names of the selected table.(Note:If the column is identity column or a computed column or a GUID column , it will not be included here. Similarly if the column has a type that is not supported, it will not be included. Please refer the Additional notes in the article at the end.) At this stage we have populated one half of the insert statement for the selected table.(i.e)

INSERT INTO TABLE (COL1,COL2,COL3) VALUES

Now we have to generate the values for all the columns in the selected table. Remember foreign key columns will have to be handled here. Now let us go back to the place where class RDGenTable() is instantiated. For convenience of the reader , I have given the code again below.

C#
RDGenTable RDGenSelectedTable = new RDGenTable(cbTableList.Text);
ColumnsGrid.DataSource = RDGenSelectedTable.TableColumns;
StreamWriter sr = new StreamWriter(cbTableList.Text + ".sql");

for (int i = 1; i <= iRequiredRows; i++)
{
    sr.WriteLine( RDGenSelectedTable.GetInsertStatement());
    sr.WriteLine("");
}

txtOutput.Text = " Execute the contents of " + cbTableList.Text + 
                 ".sql" + " in Query Analyzer";

sr.Close();

if you see, GetInsertStatement() gets called the required number of times as specified.

Let us investigate the GetInsertStatement() method code now.

C#
public string GetInsertStatement()
{
    PopulateData();
    string InsertStatement = _InsertStatement.ToString();
    _InsertStatement.ColumnValues = "";
    return InsertStatement;
}

This just calls the PopulateData method and returns the InsertStatement. And PopulateData() method has the following method.

In the PopulateData() method we call the methods GetForeignKeyValue(),GetRandomValue() depending on if the column is a foreign key or not. The GetForeignKeyValue() method in turn uses GetForeignKey() of the DCM class to select a random value of the referenced column from the referenced table. Similarly GetRandomValue() uses DataBuilder() to generate value for the selected column.

This is a small tool to understand and extend . It does an effective job of various commercial tools that are available today in the market. For easy code navigation inside the VS.NET ,I use the free tool DPack from http://www.usysware.com/DPack/Default.aspx.

Additional Information.

The tool does not support the following data types.

Image,uniqueidentifier,sql_variant,sysname.

Conclusion

Thus very fast we can generate huge volume of data for various tables keeping into consideration the relationship among them. Happy <st1:city w:st="on"><st1:place w:st="on">Reading!

his is a small tool to understand and extend . It does an effective job of various commercial tools that are available today in the market. For easy code navigation inside the VS.NET ,I use the free tool DPack from http://www.usysware.com/DPack/Default.aspx.

Additional Information.

The tool does not support the following data types.

Image,uniqueidentifier,sql_variant,sysname.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer Cognizant Technology Solutions India Pvt Ltd
Canada Canada
Balachandar Ganesan has been working with CGI for the last 6 years. His hobbies are meta physics. He loves code generation. He is currently developing code generation tool with intellisense features.

Comments and Discussions

 
GeneralMy vote of 1 Pin
Member 112229806-Mar-15 10:56
Member 112229806-Mar-15 10:56 
QuestionExtending the tool.... Pin
plogan7315-Dec-11 15:48
plogan7315-Dec-11 15:48 
Questionforeign key column are not getting populated? Pin
Amit Chaudhary26-Aug-11 4:03
Amit Chaudhary26-Aug-11 4:03 
AnswerRe: foreign key column are not getting populated? Pin
dieuyam1-Sep-11 23:55
dieuyam1-Sep-11 23:55 
GeneralYou're not closing a reader Pin
chriswigley27-May-10 12:42
chriswigley27-May-10 12:42 
GeneralMy vote of 1 Pin
WakoAivan25-Apr-10 16:59
WakoAivan25-Apr-10 16:59 
QuestionIs there same way for Views Objects Pin
Mahdi Yousefi13-Aug-07 2:29
Mahdi Yousefi13-Aug-07 2:29 
GeneralSharing Some Resolutions !!! Pin
VishalDogra771-Jun-07 22:14
VishalDogra771-Jun-07 22:14 
QuestionHOW CAN I GET THE ROWS FROM OFFSET TO NUMBER OF ROW..?? Pin
Dipan Patel From Gandhinagar26-Oct-06 1:18
Dipan Patel From Gandhinagar26-Oct-06 1:18 
AnswerRe: HOW CAN I GET THE ROWS FROM OFFSET TO NUMBER OF ROW..?? Pin
ganesan balachandar26-Oct-06 1:33
ganesan balachandar26-Oct-06 1:33 
GeneralRe: HOW CAN I GET THE ROWS FROM OFFSET TO NUMBER OF ROW..?? Pin
Dipan Patel From Gandhinagar26-Oct-06 2:53
Dipan Patel From Gandhinagar26-Oct-06 2:53 
GeneralRe: HOW CAN I GET THE ROWS FROM OFFSET TO NUMBER OF ROW..?? Pin
ganesan balachandar26-Oct-06 3:39
ganesan balachandar26-Oct-06 3:39 
Generalvery bad Pin
Thomas Haller20-Aug-06 23:10
Thomas Haller20-Aug-06 23:10 
QuestionRe: very bad Pin
skole28-Sep-06 23:30
skole28-Sep-06 23:30 
GeneralRe: very bad Pin
shum2321-Jan-07 19:40
shum2321-Jan-07 19:40 
GeneralRe: very bad Pin
Thomas Haller2-Sep-07 3:32
Thomas Haller2-Sep-07 3:32 
GeneralDatareader Error Pin
tony197117-Mar-06 5:32
tony197117-Mar-06 5:32 
GeneralRe: Datareader Error Pin
barramacanbhaird14-Jul-08 15:32
barramacanbhaird14-Jul-08 15:32 
GeneralPlease reformat article Pin
Simone Busoli8-Mar-06 0:25
Simone Busoli8-Mar-06 0:25 
GeneralRe: Please reformat article Pin
Eli.M8-Mar-06 5:45
Eli.M8-Mar-06 5:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.