Click here to Skip to main content
Click here to Skip to main content

Remove Duplicate Rows from a Table in SQL Server

By , 15 Feb 2011
 

Introduction

Most of the times, we use primary key or unique key for preventing insertion of duplicate rows in SQL Server. But if we don't use these keys, then it's obvious that duplicate rows could be entered by the user. After inserting duplicate rows into table, it becomes a major issue to delete those duplicate rows. In that time, we need to delete those duplicate rows to resolve the issue. So this topic will help us to delete those duplicate rows from the specific table.

Background

I used some basic T-SQL code to accomplish the target. So you don't need to worry to understand this code.

Problem

Firstly, we will create a table, where we will insert some duplicate rows to understand the topic properly. Create a table called ATTENDANCE by using the following code:

CREATE TABLE [dbo].[ATTENDANCE](
	[EMPLOYEE_ID] [varchar](50) NOT NULL,
	[ATTENDANCE_DATE] [date] NOT NULL
) ON [PRIMARY]   

Now insert some data into this table.

INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
			('A001',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
			('A001',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
			('A002',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
			('A002',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
			('A002',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
			('A003',CONVERT(DATETIME,'01-01-11',5)) 

After inserting the data, check the data of the below table. If we grouped the employee_id and attendance_date, then A001 and A002 become duplicates.

EMPLOYEE_ID ATTENDANCE_DATE
A001 2011-01-01
A001 2011-01-01
A002 2011-01-01
A002 2011-01-01
A002 2011-01-01
A003 2011-01-01

So how can we delete those duplicate data?

Solution

First, insert an identity column in that table by using the following code:

ALTER TABLE dbo.ATTENDANCE ADD AUTOID INT IDENTITY(1,1)  

Now the table data will be like the following table:

EMPLOYEE_ID ATTENDANCE_DATE AUTOID
A001 2011-01-01 1
A001 2011-01-01 2
A002 2011-01-01 3
A002 2011-01-01 4
A002 2011-01-01 5
A003 2011-01-01 6

Check the AUTOID column. Now we will start playing the game with this column.

Now use the following code to find out the duplicate rows that exist in the table.

SELECT * FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID) _
	FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE)  

The above code will give us the following result:

EMPLOYEE_ID ATTENDANCE_DATE AUTOID
A001 2011-01-01 2
A002 2011-01-01 4
A002 2011-01-01 5

Ultimately, these are the duplicate rows which we want to delete to resolve the issue. Use the following code to resolve it.

DELETE FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID) _
	FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE) 

Now check the data. No duplicate rows exist in the table.

Is it too complicated?

History

  • 16th February, 2011: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Sadeque Sharif
Software Developer Meridian Group
Bangladesh Bangladesh
Member
I am:
Software Administrator at Meridian Group.
 
I was:
Assistant Manager (Software Division) at KDS Garment Industries Limited.
Assistant Manager (Software Division) at E-Vision Software Limited.
 
My blog:
crea8ivecode
 
My preferred work area:
ASP.NET & SQL SERVER.
 
My email:
sadeque.sharif@yahoo.com
 
Follow me:
twitter | facebook | linkedin

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Answerremove duplicate row from oraclememberRakesh Sinha31 Oct '12 - 7:05 
Here i found useful link, how to remove duplicate row from oracle:
http://www.dotnetcodes.com/dotnetcodes/code/Source-Codes-74-How-to-remove-duplicate-rows-from-table-in-Oracle-using-PL-SQL.aspx[^]
Rakesh Sinha
Delhi

GeneralRe: remove duplicate row from oraclememberSadeque Sharif31 Oct '12 - 18:05 
Did you found any difference?
GeneralMy vote of 5membervivek.iter27 Sep '12 - 23:25 
nice explanation with Image.
it realy work dude.
GeneralRe: My vote of 5memberSadeque Sharif14 Oct '12 - 0:08 
Thanks.
GeneralMy vote of 5memberMember 792008421 Jul '11 - 9:38 
Your answer is really superb
GeneralUse the physicalmemberMember 295683825 Feb '11 - 1:46 
As suggested in the article
<a href="http://www.codeproject.com/KB/database/PhysLocOfARowInSqlServer.aspx">Physical location of a row in SQL Server</a>
You can use the physical id for index instead of creating one
DELETE
FROM dbo.ATTENDANCE
WHERE dbo.ATTENDANCE.%%physloc%%
NOT IN (SELECT MIN(b.%%physloc%%)
FROM dbo.ATTENDANCE
 
For SQl 2008
 
or
 
DELETE
FROM dbo.ATTENDANCE
WHERE dbo.ATTENDANCE.%%lockres%%
NOT IN (SELECT MIN(b.%%lockres%%)
FROM dbo.ATTENDANCE
 
For SQL 2005
GeneralhelpfullmemberPranay Rana24 Feb '11 - 2:11 
helpfull

GeneralYet another waymemberfgrimm21 Feb '11 - 22:24 
Since "WHERE x IN SELECT" does the SELECT for each row
you have done 6 inner selects .... this is not very fast if you have a "real" table with more rows
you can also do a LEFT JOIN and test against NULL..
 
SELECT to delete:
SELECT a.AUTOID
FROM dbo.ATTENDANCE AS a
LEFT JOIN (
SELECT MIN(AUTOID) as MINAUTOID
FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE
) as b
ON a.AUTOID = b.MINAUTOID
WHERE b.MINAUTOID IS NULL

 
DELETE:
DELETE FROM dbo.ATTENDANCE
FROM dbo.ATTENDANCE AS a
LEFT JOIN (
SELECT MIN(AUTOID) as MINAUTOID
FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE
) as b
ON a.AUTOID = b.MINAUTOID
WHERE b.MINAUTOID IS NULL

 
and don't forget to use index
GeneralMy vote of 3memberMoeez Shahid21 Feb '11 - 22:23 
Good.
QuestionOne more attempt to reinvent the wheel?memberVictor Nijegorodov21 Feb '11 - 22:11 
Well, it's good that you could solve this problem. Perhaps, you also think you did it your own "original" way. However, AFAIK this problem exists since the time the first relational database was developed. And this problem had been solved a lot of times with one or another way...
 
So, what can your way do that no other from this list http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=Remove+duplicate+rows+from+a+table[^] including the Microsoft http://support.microsoft.com/kb/139444[^] could not?

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 16 Feb 2011
Article Copyright 2011 by Sadeque Sharif
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid