delete all duplicate rows in a table (if a row is duplicated once, both rows should be deleted)

Question

1.00/5 (1 vote)

See more:

SQL-Server-2005

My scenario is this:

I have two tables table1 and table2.
Both of them are identical except that table2 has no primary key or constraints.

I need to copy data from table2 to table1.

table1 does not accept duplicate rows.

The primary key in table1 is based on 5 columns col1, col2, col3, col4, col5.

table2 has a bunch of duplicated rows.

I am able to get the duplicated rows in table2 using:

SQL

SELECT count(*) as 'Occurance'col1,col2,col3,col4, col5
FROM table2
GROUP BY col1,col2,col3,col4, col5
HAVING count(*) > 1

And I am able to delete one duplicate using the query below:

SQL

WITH cte
AS (SELECT col1,col2,col3,col4, col5,
row_number() OVER(PARTITION BY col1,col2,col3,col4, col5 ORDER BY col1) AS [rn]
  FROM table2
)
DELETE cte WHERE [rn] <> 1

But what I want is to delete all the duplicates which are found.

To make it clear there are two rows:
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, col6 = TEXAS
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, col6 = NEW YORK

So the first query posted above will give me a result as:
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, Occurance = 2

The second query will delete any one of the above (i.e. either TEXAS or NEW YORK).
I want to delete both TEXAS and NEW YORK.

Posted 5-Nov-14 22:29pm

getbacktosrinu

Add a Solution

Comments

jaket-cp 6-Nov-14 4:42am

I assume table1 has col6.
In your example, for col6 - what value will do you want to be inserted into table1 TEXAS or NEW YORK?

jaket-cp 6-Nov-14 5:06am

If I get what you are saying correctly.
From your example the NEW YORK record in table2, you want that to be inserted into table1.
This will not work if col1, col2, col3, col4, col5 are the pk for table1, the insert into table1 will not work as the col1, col2, col3, col4, col5 values for the NEW YORK record in table2 already exists in table1.

getbacktosrinu 6-Nov-14 5:18am

Yes that is correct

jaket-cp 6-Nov-14 5:25am

If that is the case you either need to remove the pk constraint in table1 or change the way you want the data to be store.
Possibly to create a new table with a one to many relationship to store col6 and remove col6 from table1.

Maciej Los 6-Nov-14 4:57am

Why col6 has been omitted?

4 solutions

Solution 1

I am not sure why you need to delete rows before you copy them to Table1.

What I would suggest is get the DISTINCT rows from Table2 and then insert them in Table1

SQL

INSERT INTO Table1
SELECT DISTINCT Column1, Column2, Column3 Column4, Column5 FROM Table2

Posted 5-Nov-14 22:32pm

Manas Bhardwaj

Updated 5-Nov-14 22:33pm

v2

Comments

King Fisher 6-Nov-14 4:45am

good Solution my 5+ :)

Manas Bhardwaj 6-Nov-14 7:29am

thx!

DamithSL 6-Nov-14 4:53am

distinct will not work, OP want to remove all duplicates, don't need to keep one record from each duplicate (as far as I understood)

getbacktosrinu 6-Nov-14 5:05am

you are correct sir.

getbacktosrinu 6-Nov-14 4:59am

Okay my tables are this way:

table1 has x number of rows.
table2 has x + y number of rows i.e. it has all the rows of table1 plus some.

Taking the above example,
TEXAS is available in both table1 and table2.
And TEXAS occurs twice in table2.
But NEW YORK is available in only table2.

When I copy the data from table2 to table1 only NEW YORK can be inserted (should be inserted).

With your query TEXAS will still be there (occurs once) in table2 and the insert will fail.

Solution 2

check this DEMO[^]
you can use SQL INSERT INTO SELECT Statement[^] with group by

Posted 5-Nov-14 23:21pm

DamithSL

Comments

getbacktosrinu 6-Nov-14 5:36am

Sorry but that doesn't quite cut it.
Already tried it.
It will be the same as DISTINCT.

What I want is this:
delete rows which have count(*) > 1
so the next time the count(*) = 0 (not 1. 1 implies a row still exists).

DamithSL 6-Nov-14 5:45am

it is not same as distinct, note that row no3 is the only record which not duplicate and you will get that record inserted to table1

getbacktosrinu 6-Nov-14 5:50am

In the demo which you shared above the signatures of the tables are different.
But my tables are the same.
I mentioned this above:

Both of them are identical except that table2 has no primary key or constraints.

The query in the demo was:
INSERT INTO Table1
SELECT [col2], [col3], [col4], [col5], [col6], [col7]
FROM Table2
GROUP BY [col1], [col2], [col3], [col4], [col5], [col6], [col7]
HAVING count(*) = 1;

But I need
INSERT INTO Table1
SELECT *
FROM Table2
GROUP BY [col1], [col2], [col3], [col4], [col5], [col6], [col7]
HAVING count(*) = 1;

DamithSL 6-Nov-14 6:22am

http://sqlfiddle.com/#!6/bebb5/1[^]

Solution 4

You may try this also

SQL

SELECT col1,col2,col3,col4, col5
into #Duplicates
FROM table2
GROUP BY col1,col2,col3,col4, col5
HAVING count(1) > 1 

delete from table2 tbl2  where exists(select col1,col2,col3,col4, col5 from #Duplicates dup
where tbl2.col1=dup.col1 and tbl2.col2=dup.col2  and tbl2.col3=dup.col3 and tbl2.col4=dup.col4 and tbl2.col5=dup.col5
)

Posted 6-Nov-14 0:59am

Shweta N Mishra

Comments

getbacktosrinu 6-Nov-14 7:10am

(already worked using the above solution... will be a month before I get data again like that)

I haven't tried this but I want all the data in the rows.

So will your queries work like this?
SELECT *
into #Duplicates
FROM table2
GROUP BY col1,col2,col3,col4, col5
HAVING count(1) > 1

delete from table2 tbl2 where exists(select * from #Duplicates dup
where tbl2.col1=dup.col1 and tbl2.col2=dup.col2 and tbl2.col3=dup.col3 and tbl2.col4=dup.col4 and tbl2.col5=dup.col5
)

Shweta N Mishra 6-Nov-14 7:16am

No you can not use select * , But i see what you want to confirm. You want to see the duplicate records

You can check that by below query which is the same command as of delete, Only you to use select instead of delete and if you want you can dump the records into a temporary table.

select * from table2 tbl2 where exists(select * from #Duplicates dup
where tbl2.col1=dup.col1 and tbl2.col2=dup.col2 and tbl2.col3=dup.col3 and tbl2.col4=dup.col4 and tbl2.col5=dup.col5
)

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

jaket-cp · Accepted Answer · 2014-11-06T00:01:00

If I understand your requirements correctly.
The primary key constraint on table1 needs to be dropped.

SQL

--data setup for testing
declare @table2 table(col1 int, col2 int, col3 int, col4 int, col5 int, col6 varchar(10));
declare @table1 table(col1 int, col2 int, col3 int, col4 int, col5 int, col6 varchar(10));
--values for @table2
insert into @table2 values(1,1,1,1,1,'TEXAS');
insert into @table2 values(1,1,1,1,1,'NEW YORK');
insert into @table2 values(1,1,1,1,1,'IOWA');
insert into @table2 values(1,1,1,1,2,'TEXAS');
insert into @table2 values(1,1,1,1,2,'NEW YORK');
insert into @table2 values(1,1,1,1,2,'IOWA');
insert into @table2 values(1,1,1,2,1,'TEXAS');
insert into @table2 values(1,1,1,2,1,'NEW YORK');
insert into @table2 values(1,1,1,2,1,'IOWA');
insert into @table2 values(1,1,2,1,1,'TEXAS');
insert into @table2 values(1,1,2,1,1,'NEW YORK');
insert into @table2 values(1,1,2,1,1,'IOWA');
insert into @table2 values(1,2,1,1,1,'TEXAS');
insert into @table2 values(1,2,1,1,1,'NEW YORK');
insert into @table2 values(1,2,1,1,1,'IOWA');
insert into @table2 values(2,1,1,1,1,'TEXAS');
insert into @table2 values(2,1,1,1,1,'NEW YORK');
insert into @table2 values(2,1,1,1,1,'IOWA');
--values for @table1
insert into @table1 values(1,1,1,1,2,'TEXAS');
insert into @table1 values(1,2,1,1,1,'TEXAS');

SQL

--insert into @table1 from @table2 but not exist in @table1
insert into @table1
select b.col1, b.col2, b.col3, b.col4, b.col5, b.col6
from @table2 b
where not exists(
    select a.col1, a.col2, a.col3, a.col4, a.col5, a.col6
    from @table1 a
    where a.col1 = b.col1
    and a.col2 = b.col2
    and a.col3 = b.col3
    and a.col4 = b.col4
    and a.col5 = b.col5
    and a.col6 = b.col6
);

This one can be run before and after the insert

SQL

--remove from @table2 where already in @table1
with ToRemove as(
    select b.col1, b.col2, b.col3, b.col4, b.col5, b.col6
    from @table2 b
    where exists(
        select a.col1, a.col2, a.col3, a.col4, a.col5, a.col6
        from @table1 a
        where a.col1 = b.col1
        and a.col2 = b.col2
        and a.col3 = b.col3
        and a.col4 = b.col4
        and a.col5 = b.col5
        and a.col6 = b.col6
    )
)
delete from ToRemove;

I didn't know the CTE could be used in this way for a delete :) thanks for that one.

I suggest you investigate one to many relations in database and maybe change the way the data is stored.
Have a read of http://www.databaseprimer.com/pages/relationship_1tox/[^]

delete all duplicate rows in a table (if a row is duplicated once, both rows should be deleted)

4 solutions

Solution 1

Solution 2

Solution 3

Solution 4

Add your solution here

Preview 0

Existing Members

...or Join us