Click here to Skip to main content
Click here to Skip to main content

How to find duplicate values in a table.

By , 16 Nov 2010
 

Duplicate values in tables can create a major problem when we do not make a primary key or a unique key on a table. In these cases we need to find out the duplicate records and need to delete them. We can use the Having Clause to find out that duplicate records.we show this with the help of an example.

Suppose we have a table named EmployeeDeptInfo which have the column Employeid and Departmentid. The query for creating this table is given below:

Create table EmployeeDeptInfo ( Employeeid int, Departmentid int)

Employeeid contains the Id of the employee and Departmentid contains the Id of the department to which he/she is belonged. This table do not have any primary key defined on it and also it doesnt have any unique key constraint defined on any column.

Suppose the table contains the following data

Employeeid Departmentid
1 1
2 2
3 2
4 3
3 2
2 2
5 4
2 2

In this table, entries for the employee having employeeid 2 & 3 are get repeated.In this example the data in the table is not much for the example point of view but in real time scenario it can be billions of rows and duplication of rows can be a very big concern. Therefore it is necessary to find out the duplicate rows in the table. We can use the Having Clause to find out the duplicate rows.

Query for finding out the duplicate rows in the table:

Select Employeeid, Departmentid from EmployeeDeptInfo Group By Employeeid, 
    DepartmentId having (count(employeeid) >1 and count (Departmentid) >1)

This query will give us the following results:

Employeeid Departmentid
2 2
3 2

We can also find out that how many times a record is repeated in the table. For example, The following query gives us the result of how many times a given record is repeated

Select Employeeid, Departmentid, 
    count (Employeeid) as NoOfRepeat from EmployeeDeptInfo Group By Employeeid, 
    DepartmentId having (count(employeeid) >1 and count (Departmentid) >1)

This query will give us the following results

Employeeid Departmentid NoOfRepeat
2 2 3
3 2 2

Here NoOfRepeat shows the number of times the records are duplicated in the table

Summary

This article shows that we can find the duplicate records with the help of the Group By and Having Clause.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Vivek Johari
Software Developer (Senior) Magic Software Pvt Ltd
India India
Member
I am currently working as a Analyst and have around 6 yeras of experience in database.Currently, I am working in Magic Software Pvt.Ltd. in Noida.
 
Degree:-
Master Degree in Computer(MCA)
 
Work experience:-
Designing of the database.
Database Optimization.
Writing Complex Stored Procedures,Functions,Triggers etc.
Designing and developing SSIS & DTS packages.
Designing SQL Reports using SSRS.
Database Server Maintenance.
 
Certification:-
Microsoft certified Sql DBA in Sql server 2008 (MCTS).
Microsoft certified BI professional in Sql server 2008 (MCTS).
Oracle certified profession DBA in ORACLE 10g (OCP)
certified profession DBA in ORACLE 9i (OCP)
 
My other publication
Technical Blog:- Technologies with Vivek Johari
 
Moderator and Blogger at BeyondRelational.com
 
Guest Author and Blogger at sqlservercentral.com

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberOshtri Deka5 May '11 - 1:42 
This should be learned on same day when you write your first SELECT query! This is so simple and useful, but neglected.
My 5 for promoting knowledge.
GeneralRe: My vote of 5memberVivek Johari3 Nov '11 - 1:29 
Thanks Oshtri Smile | :)
GeneralMy vote of 3memberBloodyBaron30 Jan '11 - 8:25 
Explaining how to remove the duplicate occurrences would be welcome.
GeneralRe: My vote of 3memberVivek Johari3 Nov '11 - 1:29 
Thanks Smile | :)
GeneralMy vote of 3memberDilip Baboo22 Nov '10 - 6:32 
How to delete them would be great too !
GeneralRe: My vote of 3memberFlamewave422 Nov '10 - 8:43 
This should do the trick (for MS SQL 2005 & 2008, should work with 2000 too but I don't have access to a 2000 instance to test on):
 
declare @col1 <column_type>
declare @col2 <column_type>
 
set rowcount 1
while exists (select col1, col2 from myTable group by col1, col2 having count(*) > 1)
    begin
    select top 1 @col1 = col1, @col2 = col2 from myTable group by col1, col2 having count(*) > 1
    delete from myTable where col1 = @col1 and col2 = @col2
    end
set rowcount 0
 
You can group on as many columns as you need to by simply adding more columns (and declaring variables for them). Be sure to change <column_type> to the proper data type for that column.
GeneralExcellent TipmemberFlamewave422 Nov '10 - 5:22 
Excellent tip, I've had to use this approach many times when normalizing or re-structuring an old database. It's a pain to do, but this approach is by far the easiest one I've come across.
 
Related article you may be interested in:
How to remove duplicate rows from a table in SQL Server (MS KB article)
 
Also, you can simplify your having clause. You don't need to have a statement for each column, you can simply do: having count(*) > 1
GeneralRe: Excellent TipmemberVivek Johari22 Nov '10 - 8:04 
Thanks.....

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 16 Nov 2010
Article Copyright 2010 by Vivek Johari
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid