Click here to Skip to main content
15,885,141 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
My scenario is this:

I have two tables table1 and table2.
Both of them are identical except that table2 has no primary key or constraints.

I need to copy data from table2 to table1.

table1 does not accept duplicate rows.

The primary key in table1 is based on 5 columns col1, col2, col3, col4, col5.

table2 has a bunch of duplicated rows.

I am able to get the duplicated rows in table2 using:
SQL
SELECT count(*) as 'Occurance'col1,col2,col3,col4, col5
FROM table2
GROUP BY col1,col2,col3,col4, col5
HAVING count(*) > 1


And I am able to delete one duplicate using the query below:
SQL
WITH cte
AS (SELECT col1,col2,col3,col4, col5,
row_number() OVER(PARTITION BY col1,col2,col3,col4, col5 ORDER BY col1) AS [rn]
  FROM table2
)
DELETE cte WHERE [rn] <> 1


But what I want is to delete all the duplicates which are found.

To make it clear there are two rows:
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, col6 = TEXAS
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, col6 = NEW YORK

So the first query posted above will give me a result as:
col1 = 1, col2 = 2, col3 = 3, col4 = 4, col5 = 5, Occurance = 2

The second query will delete any one of the above (i.e. either TEXAS or NEW YORK).
I want to delete both TEXAS and NEW YORK.
Posted
Comments
jaket-cp 6-Nov-14 4:42am    
I assume table1 has col6.
In your example, for col6 - what value will do you want to be inserted into table1 TEXAS or NEW YORK?
jaket-cp 6-Nov-14 5:06am    
If I get what you are saying correctly.
From your example the NEW YORK record in table2, you want that to be inserted into table1.
This will not work if col1, col2, col3, col4, col5 are the pk for table1, the insert into table1 will not work as the col1, col2, col3, col4, col5 values for the NEW YORK record in table2 already exists in table1.
getbacktosrinu 6-Nov-14 5:18am    
Yes that is correct
jaket-cp 6-Nov-14 5:25am    
If that is the case you either need to remove the pk constraint in table1 or change the way you want the data to be store.
Possibly to create a new table with a one to many relationship to store col6 and remove col6 from table1.
Maciej Los 6-Nov-14 4:57am    
Why col6 has been omitted?

I am not sure why you need to delete rows before you copy them to Table1.

What I would suggest is get the DISTINCT rows from Table2 and then insert them in Table1
SQL
INSERT INTO Table1
SELECT DISTINCT Column1, Column2, Column3 Column4, Column5 FROM Table2
 
Share this answer
 
v2
Comments
King Fisher 6-Nov-14 4:45am    
good Solution my 5+ :)
Manas Bhardwaj 6-Nov-14 7:29am    
thx!
DamithSL 6-Nov-14 4:53am    
distinct will not work, OP want to remove all duplicates, don't need to keep one record from each duplicate (as far as I understood)
getbacktosrinu 6-Nov-14 5:05am    
you are correct sir.
getbacktosrinu 6-Nov-14 4:59am    
Okay my tables are this way:

table1 has x number of rows.
table2 has x + y number of rows i.e. it has all the rows of table1 plus some.

Taking the above example,
TEXAS is available in both table1 and table2.
And TEXAS occurs twice in table2.
But NEW YORK is available in only table2.

When I copy the data from table2 to table1 only NEW YORK can be inserted (should be inserted).

With your query TEXAS will still be there (occurs once) in table2 and the insert will fail.
check this DEMO[^]
you can use SQL INSERT INTO SELECT Statement[^] with group by
 
Share this answer
 
Comments
getbacktosrinu 6-Nov-14 5:36am    
Sorry but that doesn't quite cut it.
Already tried it.
It will be the same as DISTINCT.

What I want is this:
delete rows which have count(*) > 1
so the next time the count(*) = 0 (not 1. 1 implies a row still exists).
DamithSL 6-Nov-14 5:45am    
it is not same as distinct, note that row no3 is the only record which not duplicate and you will get that record inserted to table1
getbacktosrinu 6-Nov-14 5:50am    
In the demo which you shared above the signatures of the tables are different.
But my tables are the same.
I mentioned this above:

Both of them are identical except that table2 has no primary key or constraints.

The query in the demo was:
INSERT INTO Table1
SELECT [col2], [col3], [col4], [col5], [col6], [col7]
FROM Table2
GROUP BY [col1], [col2], [col3], [col4], [col5], [col6], [col7]
HAVING count(*) = 1;

But I need
INSERT INTO Table1
SELECT *
FROM Table2
GROUP BY [col1], [col2], [col3], [col4], [col5], [col6], [col7]
HAVING count(*) = 1;
If I understand your requirements correctly.
The primary key constraint on table1 needs to be dropped.
SQL
--data setup for testing
declare @table2 table(col1 int, col2 int, col3 int, col4 int, col5 int, col6 varchar(10));
declare @table1 table(col1 int, col2 int, col3 int, col4 int, col5 int, col6 varchar(10));
--values for @table2
insert into @table2 values(1,1,1,1,1,'TEXAS');
insert into @table2 values(1,1,1,1,1,'NEW YORK');
insert into @table2 values(1,1,1,1,1,'IOWA');
insert into @table2 values(1,1,1,1,2,'TEXAS');
insert into @table2 values(1,1,1,1,2,'NEW YORK');
insert into @table2 values(1,1,1,1,2,'IOWA');
insert into @table2 values(1,1,1,2,1,'TEXAS');
insert into @table2 values(1,1,1,2,1,'NEW YORK');
insert into @table2 values(1,1,1,2,1,'IOWA');
insert into @table2 values(1,1,2,1,1,'TEXAS');
insert into @table2 values(1,1,2,1,1,'NEW YORK');
insert into @table2 values(1,1,2,1,1,'IOWA');
insert into @table2 values(1,2,1,1,1,'TEXAS');
insert into @table2 values(1,2,1,1,1,'NEW YORK');
insert into @table2 values(1,2,1,1,1,'IOWA');
insert into @table2 values(2,1,1,1,1,'TEXAS');
insert into @table2 values(2,1,1,1,1,'NEW YORK');
insert into @table2 values(2,1,1,1,1,'IOWA');
--values for @table1
insert into @table1 values(1,1,1,1,2,'TEXAS');
insert into @table1 values(1,2,1,1,1,'TEXAS');

SQL
--insert into @table1 from @table2 but not exist in @table1
insert into @table1
select b.col1, b.col2, b.col3, b.col4, b.col5, b.col6
from @table2 b
where not exists(
    select a.col1, a.col2, a.col3, a.col4, a.col5, a.col6
    from @table1 a
    where a.col1 = b.col1
    and a.col2 = b.col2
    and a.col3 = b.col3
    and a.col4 = b.col4
    and a.col5 = b.col5
    and a.col6 = b.col6
);

This one can be run before and after the insert
SQL
--remove from @table2 where already in @table1
with ToRemove as(
    select b.col1, b.col2, b.col3, b.col4, b.col5, b.col6
    from @table2 b
    where exists(
        select a.col1, a.col2, a.col3, a.col4, a.col5, a.col6
        from @table1 a
        where a.col1 = b.col1
        and a.col2 = b.col2
        and a.col3 = b.col3
        and a.col4 = b.col4
        and a.col5 = b.col5
        and a.col6 = b.col6
    )
)
delete from ToRemove;

I didn't know the CTE could be used in this way for a delete :) thanks for that one.

I suggest you investigate one to many relations in database and maybe change the way the data is stored.
Have a read of http://www.databaseprimer.com/pages/relationship_1tox/[^]
 
Share this answer
 
Comments
getbacktosrinu 6-Nov-14 6:28am    
That worked.
Thanks for that.
jaket-cp 6-Nov-14 6:32am    
no problem
Rajesh waran 6-Nov-14 7:00am    
Good try.
You may try this also

SQL
SELECT col1,col2,col3,col4, col5
into #Duplicates
FROM table2
GROUP BY col1,col2,col3,col4, col5
HAVING count(1) > 1 

delete from table2 tbl2  where exists(select col1,col2,col3,col4, col5 from #Duplicates dup
where tbl2.col1=dup.col1 and tbl2.col2=dup.col2  and tbl2.col3=dup.col3 and tbl2.col4=dup.col4 and tbl2.col5=dup.col5
)
 
Share this answer
 
Comments
getbacktosrinu 6-Nov-14 7:10am    
(already worked using the above solution... will be a month before I get data again like that)

I haven't tried this but I want all the data in the rows.

So will your queries work like this?
SELECT *
into #Duplicates
FROM table2
GROUP BY col1,col2,col3,col4, col5
HAVING count(1) > 1

delete from table2 tbl2 where exists(select * from #Duplicates dup
where tbl2.col1=dup.col1 and tbl2.col2=dup.col2 and tbl2.col3=dup.col3 and tbl2.col4=dup.col4 and tbl2.col5=dup.col5
)
Shweta N Mishra 6-Nov-14 7:16am    
No you can not use select * , But i see what you want to confirm. You want to see the duplicate records

You can check that by below query which is the same command as of delete, Only you to use select instead of delete and if you want you can dump the records into a temporary table.

select * from table2 tbl2 where exists(select * from #Duplicates dup
where tbl2.col1=dup.col1 and tbl2.col2=dup.col2 and tbl2.col3=dup.col3 and tbl2.col4=dup.col4 and tbl2.col5=dup.col5
)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900