65.9K
CodeProject is changing. Read more.
Home

Determine Holes in IDENTITY Columns

starIconstarIcon
emptyStarIcon
starIcon
emptyStarIconemptyStarIcon

2.75/5 (6 votes)

Sep 29, 2007

CPOL

1 min read

viewsIcon

21752

An article on an efficient method to determine missing values in an IDENTITY() sequence.

Purpose

This basically describes an efficient method for determining holes, or missing values, in a table that contains an IDENTITY() column. As an example, let's say you have a table with the following definition:

CREATE TABLE Main
(
    id int IDENTITIY(1,1),
    someOtherValue varchar(20) NULL
)

INSERT INTO Main VALUES (NULL)
INSERT INTO Main VALUES (NULL)
INSERT INTO Main VALUES (NULL)
INSERT INTO Main VALUES (NULL)
INSERT INTO Main VALUES (NULL)

That essentially produces this table structure:

id someOtherValue
1 NULL
2 NULL
3 NULL
4 NULL
5 NULL

Now suppose that you delete rows 2 and 3. You'll be left with IDs 1, 4, and 5. This code essentially tells you that 2 and 3 are missing.

Introduction

Originally, I did some quick research on Google and ran into methods described by pages like this: http://www.sql-server-helper.com/tips/determine-missing-identity-values.aspx. Essentially, they want to create a temporary pivot table containing all of the possible identity values, 1 through MAX(), and perform a LEFT JOIN on the pivot and the table in question. Any NULL values on the main table equates to a missing identity.

That's all well and good, until you want it for something involving a table with more than 10,000 rows.

The Code

SET NOCOUNT ON 

IF OBJECT_ID('tempdb..#Main') IS NOT NULL
    DROP TABLE #Main -- Oops, something went wrong

CREATE TABLE #Main
(
    id int IDENTITY(1,1), 
    someOtherValue varchar(10) NULL
)

INSERT INTO #Main 
SELECT NULL 
UNION ALL SELECT NULL 
UNION ALL SELECT NULL 
UNION ALL SELECT NULL 
UNION ALL SELECT NULL 
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL

SELECT * FROM #Main

DELETE FROM #Main WHERE id IN (1, 2, 4, 5, 7, 9, 10)

SELECT * FROM #Main

DECLARE @temp TABLE (firstMiss int NOT NULL, nextValue int NOT NULL)
DECLARE @maxId INT 
DECLARE @lastId INT
DECLARE @increment INT
DECLARE @lowerBound INT

SELECT @maxId = MAX(id) FROM #Main
SELECT @lastId = IDENT_CURRENT('tempdb..#Main')
SELECT @increment = IDENT_INCR('tempdb..#Main')
SELECT @lowerBound = IDENT_SEED('tempdb..#Main')

-- Get the lower-bound misses
INSERT INTO @temp
SELECT
    @lowerBound, -- lower-bound
    (SELECT TOP 1 id FROM #Main WHERE id > @lowerBound ORDER BY id ASC)

-- Get the middle misses
INSERT INTO @temp
SELECT 
    id + @increment,
    COALESCE(
        (
            SELECT TOP 1 
                id 
            FROM 
                #Main 
            WHERE 
                id > (A.id + @increment) 
            ORDER BY 
                id ASC
        ), 
        @maxId)
FROM 
    #Main A
WHERE 
    id + @increment NOT IN (SELECT id FROM #Main) 
    AND (id + @increment) <= @maxId 

-- Get the upper-bound misses
INSERT INTO @temp
SELECT
    (SELECT TOP 1 id + @increment FROM #Main WHERE id < @lastId ORDER BY id DESC),
    @lastId + @increment -- The next value will be this

SELECT * FROM @temp

DECLARE @missingId TABLE (id int)
DECLARE @maxFirstMiss INT
DECLARE @firstMiss INT
DECLARE @nextValue INT

SELECT @firstMiss = MIN(firstMiss) FROM @temp
SELECT @maxFirstMiss = MAX(firstMiss) FROM @temp

WHILE @firstMiss <= @maxFirstMiss
BEGIN
    SELECT @nextValue = nextValue FROM @temp WHERE firstMiss = @firstMiss

    WHILE @firstMiss < @nextValue
    BEGIN
        INSERT INTO @missingId VALUES (@firstMiss)
        SET @firstMiss = @firstMiss + @increment
    END
        
    SELECT @firstMiss = MIN(firstMiss) FROM @temp WHERE firstMiss > @firstMiss
END

SELECT * FROM @missingId

DROP TABLE #Main

Assumptions

First, I only tested this using the IDENTITY(1,1) scenario. I really don't deal with non-1 increment or seed values. It's coded to work with whatever your table allows, but the logic may not pan out.

Second, your performance may vary depending on how many gaps you've got and how large your tables are. My tests were on tables that had around 300,000 rows, and it worked much faster than the pivot-table method. So, be warned.