13,256,260 members (47,019 online)
Add your own
alternative version

#### Stats

20.4K views
11 bookmarked
Posted 26 Aug 2009

# Very fast test data generation using exponential INSERT

, 28 Aug 2009
 Rate this:
Please Sign up or sign in to vote.
Instead of using incremental INSERT to generate test data, this method effectly copies the existing data multiple times.

## Introduction

This article (my first) will describe an algorithm that enables a large amount of data to be generated very quickly using a SQL query. The test data can be static or incremental, such as “Item Name” and “Item ID”, respectively, as shown below:

## Background

One of the tasks I did in a project involves generating a testing table with 103,680,000 records. The conventional method of data generation would take a month; hence, a fast method of data insertion was required. The new method took only 5 hours.

## Using the code

### Conventional method – Sequential INSERT

The conventional way of generating a list of numbers from 0…100000 would be using a loop and an `INSERT` statement as follows:

```CREATE TABLE #tempTable([Item ID] [bigint], [Item Name] nvarchar(30))
DECLARE @counter int
SET @counter = 1
WHILE (@counter < 100000)
BEGIN
INSERT INTO #tempTable VALUES (@counter, 'Hammer')
SET @counter = @counter + 1
END
SELECT * FROM #tempTable
DROP TABLE #tempTable```

Let's call this method of data generation “Sequential INSERT”.

### New method – Exponential INSERT

The new method effectively makes a copy of the existing data and appends it as new data, and does so repeatedly until the desired amount of data is generated.

Here is the code for the exponential `INSERT`:

```CREATE TABLE #tempTable([Item ID] [bigint], [Item Name] nvarchar(30))
INSERT INTO #tempTable VALUES (1, 'Hammer')
WHILE((SELECT COUNT(*) FROM #tempTable) < 100000)
BEGIN
INSERT INTO #tempTable ([Item ID], [Item Name])
(SELECT [Item ID] + (SELECT COUNT(*) FROM #tempTable),
'Hammer' FROM #tempTable)
END
SELECT * FROM #tempTable
DROP TABLE #tempTable```

## Points of interest

The condition for the `WHILE..LOOP` is (`SELECT COUNT(*)`). This condition statement takes a long time to be evaluated. A faster method would be to calculate how many iterations are needed to generate the desired number of records, i.e., 100,000 records in this case, which is 2^17=131,072, so we can rewrite the code to stop after the 17th iteration.

It took 4 seconds to execute the number count from 1 to 100,0000; the exponential method took two seconds with the code below:

```CREATE TABLE #tempTable([Item ID] [bigint], [Item Name] nvarchar(30))
INSERT INTO #tempTable VALUES (1, 'Hammer')
DECLARE @counter int
SET @counter = 1
WHILE(@counter <= 17)
BEGIN
INSERT INTO #tempTable ([Item ID], [Item Name])
(SELECT [Item ID] + (SELECT COUNT(*) FROM #tempTable),
'Hammer' FROM #tempTable)
SET @counter = @counter + 1
END
SELECT * FROM #tempTable
DROP TABLE #tempTable```

Also, not only can you use this to increment a number field, but it can be applied to datetime fields as well.

## History

• This is v1.0.

## License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

## About the Author

 United Kingdom
No Biography provided

## You may also be interested in...

 Pro Pro

## Comments and Discussions

 First Prev Next
 My vote of 2 Aborq31-Aug-09 22:50 Aborq 31-Aug-09 22:50
 Re: My vote of 2 weilidai20011-Sep-09 1:03 weilidai2001 1-Sep-09 1:03
 Perhaps, can you suggest a use case in which it needs to be scaled? In my case, I had to emulate a hardware component that produces 5 records per second, and I needed to generate the equivelant amount of test data as if that hardware has been running for 1month, then 3months' worth of data, then 10months' worth. This algorithm worked for me.
 Re: My vote of 2 Aborq3-Sep-09 4:50 Aborq 3-Sep-09 4:50
 Why go row-by-row? Aborq31-Aug-09 22:37 Aborq 31-Aug-09 22:37
 Re: Why go row-by-row? weilidai20011-Sep-09 1:04 weilidai2001 1-Sep-09 1:04
 Could a variation be used to speed up creation of non-trivial records? supercat926-Aug-09 13:10 supercat9 26-Aug-09 13:10
 Re: Could a variation be used to speed up creation of non-trivial records? weilidai200126-Aug-09 14:12 weilidai2001 26-Aug-09 14:12
 Re: Could a variation be used to speed up creation of non-trivial records? supercat927-Aug-09 6:23 supercat9 27-Aug-09 6:23
 Re: Could a variation be used to speed up creation of non-trivial records? Md. Marufuzzaman27-Aug-09 11:26 Md. Marufuzzaman 27-Aug-09 11:26
 Last Visit: 31-Dec-99 19:00     Last Update: 22-Nov-17 14:45 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.171114.1 | Last Updated 28 Aug 2009
Article Copyright 2009 by weilidai2001
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid