Removing Duplicates in a Table based on two Columns

Question

1.24/5 (3 votes)

See more:

C#

Below is my table

Name1	Name2	Distance
BG	CH	280
CH 	BG	280
CH 	HY	350
HY	CH	350
CH 	CBE	500

I want remove duplicates in it How can i do it?

In above table describes distance between two cities, name1 is city1 & name2 is City2

Urgent help and attention is required..,

Thanks in Advance

Posted 1-Dec-15 5:50am

tpkpradeep

Updated 23-Jan-21 19:05pm

Add a Solution

Comments

Sergey Alexandrovich Kryukov 1-Dec-15 14:00pm

Why not preventing adding duplicates in first place?
—SA

tpkpradeep 1-Dec-15 14:21pm

well good question.. :) customers of different type.., :)

5 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Answer 1 · 2015-12-01T06:01:00

Solution 1

Complicated...but possible. Have a look here: https://support.microsoft.com/en-us/kb/139444[^] - it explains the process, but I'd probable create an SQL function which took the two names, and returned a composite name that was ordered, so the BG, CH returned the same value as CH, BG. And I hope you have an ID column in there, because you are going to need one!

Posted 1-Dec-15 6:01am

OriginalGriff

Comments

tpkpradeep 1-Dec-15 14:22pm

I'd probable create an SQL function which took the two names, and returned a composite name that was ordered, so the BG, CH returned the same value as CH, BG

How can i perform this? I tried in different ways but i couldn't.., help me this.., please

OriginalGriff 1-Dec-15 14:33pm

You know how to create an SQL function (as opposed to a Stored Procedure) don't you?

tpkpradeep 2-Dec-15 7:40am

hi OriginalGriff
how about this..

select dup1.name1,dup1.tab1,dup1.dist
from
(
select t.rk
from
(
SELECT t1.name1 tab1,t2.name1,dense_rank() over (order by t1.name1) as rk
FROM #tmp_dup t1
inner join #tmp_dup t2 on t1.name1 = t2.name2 and t1.name2 = t2.name1
AND t1.Name1 NOT IN (SELECT name2 FROM #tmp_dup WHERE name2=t1.name1 AND name1 = t2.name2)
group by t1.name1,t2.name1
) as t
group by t.rk
having count(t.rk) = 1
) as dup
left outer join
(
SELECT t1.name1 tab1,t2.name1,dense_rank() over (order by t1.name1) as rk,t1.dist
FROM #tmp_dup t1
inner join #tmp_dup t2 on t1.name1 = t2.name2 and t1.name2 = t2.name1
AND t1.Name1 NOT IN (SELECT name2 FROM #tmp_dup WHERE name2=t1.name1 AND name1 = t2.name2)
group by t1.name1,t2.name1,t1.dist
) as dup1 on dup.rk = dup1.rk

tpkpradeep 1-Dec-15 15:26pm

:) I know..,

I tried this

with cte as(
select name1,name2,
row_number() over
(
order by name1,name2
) as cnt
from
#tmp_dup
group by name1,name2
)

select t1.name1,t1.name2,t1.rk
from
(
select a.name1,a.name2,a.cnt as cnt,
dense_rank() over (order by a.name1) as rk
from cte a
where exists(
select 1 from cte b where a.name1 = b.name2 and a.name2 = b.name1)
) t1
group by t1.name1,t1.name2,t1.rk

i got output as below

name1 name2 rk
BG CH 1
CH BG 2
CH HY 2
HY CH 3

but i got strucked removing the value 2

Member 7870345 · Answer 2 · 2018-04-10T23:31:00

The solution consists in first place to find exactly where the "bad registers" are. If we see the records (BG, CH, 280) and (CH,BG,280), the problems consists that are "equal" if they were inserted always both fields in alfabetical order that is (BG, CH,280). So a record is not good in first opinion, if name2 is
previous to name1.

The query:

SQL

SELECT 
	CASE WHEN t1.name1<>(CASE WHEN t1.name1<t1.name2 THEN t1.name1 ELSE t1.name2 end) THEN 'bad register' else 'good registerk' end as clase
	,t1.*
  FROM table1 t1

says that (CH, BG, 280), (HY,CH, 350) are 'wrong register' (what is right), but also says that (CH,CBE, 500) is a 'wrong register' (what is worng).
This last record really isnt´t a 'wrong register' because there isn´t a 'mirror' register (CBE, CH, 250)

So if we use the query:

SQL

SELECT 
	CASE WHEN t1.name1<>(CASE WHEN t1.name1<t1.name2 THEN t1.name1 ELSE t1.name2 end) THEN 'wrong register' else 'ok register' end as clase
	,t1.*
  FROM table1 t1
  JOIN table1 t2
  ON t1.name1=t2.name2 AND t1.name2=t2.name1

we only have the real 'wrong register' (because if there is no 'mirrow' register it says that it´s a 'ok register')

Then, the query:

SQL

SELECT t1.*
  FROM table1 t1
  JOIN table1 t2
  ON t1.name1=t2.name2 AND t1.name2=t2.name1
  WHERE t1.name1<>(CASE WHEN t1.name1<t1.name2 THEN t1.name1 ELSE t1.name2 end)

just list the records that you need to delete.

or simplier:

SQL

SELECT t1.*
  FROM table1 t1
  JOIN table1 t2
  ON t1.name1=t2.name2 AND t1.name2=t2.name1
  WHERE t1.name1>t1.name2

Finaly the order:

SQL

delete TABLEdELETE
	FROM table1 tableDelete
	inner JOIN table1 t1
		on tabledelete.name1=t1.name1 and tabledelete.name2=t1.name2	
	inner JOIN  table1 t2
		ON ( t1.name1>t1.name2 AND t1.name1=t2.name2 AND t1.name2=t2.name1 )

deletes the register that you want.

Note: If your data are:
BG CH 1
CH BG 2
CH HY 2
HY CH 3

and you don´t want that the record CH, BG, 2 will be delete then use:

SQL

delete TABLEdELETE
	FROM table1 tableDelete
	inner JOIN table1 t1
		on tabledelete.name1=t1.name1 and tabledelete.name2=t1.name2 	
	inner JOIN  table1 t2
		ON ( t1.name1>t1.name2 AND t1.name1=t2.name2 AND t1.name2=t2.name1 AND t1.distance=t2.distance )

Member 14676000 · Answer 3 · 2019-12-02T18:13:00

Solution 5

delete from table1 t1
WHERE t1.name1>t1.name2

Posted 2-Dec-19 18:13pm

Member 14676000

Comments

CHill60 3-Dec-19 5:53am

So, if 4 years ago the OP had a table that contained

Name1	Name2	Distance
BG	CH	280
CH 	BG	280
HY 	CH	350
CH 	CBE	500

You would have them remove the HY + CH line, despite the fact it is not duplicated.

Member 14676000 · Answer 4 · 2019-12-03T20:38:00

Solution 6

select * from table1 where rowid in (select t1.rowid from table1 t1,table1 t2 where t1.Name1>t1.Name2 and t1.Name1=t2.Name2 and t2.Name1=t1.Name2 );

delete from table1 where rowid in (select t1.rowid from table1 t1,table1 t2 where t1.Name1>t1.Name2 and t1.Name1=t2.Name2 and t2.Name1=t1.Name2 );

Posted 3-Dec-19 20:38pm

Member 14676000

Comments

CHill60 4-Dec-19 3:46am

You posted a solution yesterday. Do not post multiple solutions to the same question - it is confusing for everyone and just looks like rep-point farming.
This solution is no better than your original one but is starting to look like a copy of Solution 4.
Always read the previous solutions and make sure you are bringing something new to the thread … unlikely after 4 years has passed.

Member 14676000 5-Dec-19 10:06am

Above solution 4 is not a copy of solution 4,it is improved solution of 4 ,they never used rowid concept

Member 15054427 · Answer 5 · 2021-01-23T19:05:00

Solution 7

I have come up with this below SQL query and it works.The main idea is to sort the values in the two columns.

SELECT Name1,Name2,distance from name_tbl INNER JOIN(
SELECT DISTINCT CASE WHEN name1<name2 THEN name1 ELSE name2 END as firstName,
CASE WHEN name1<name2 THEN name2 ELSE name1 END as secondName
FROM name_tbl
)t1
ON name_tbl.name1 = t1.firstName AND name_tbl.name2 = t1.secondName

Posted 23-Jan-21 19:05pm

Member 15054427

Updated 23-Jan-21 19:08pm

v2

Comments

CHill60 25-Jan-21 4:00am

I suspect the reason you are being downvoted is because the principle is no different to the "principle" given in Solutions 1, 4 and 6. You have just used a sub-query instead of a function or CTE.
I applaud your desire to help, but I would stick to answering newer posts, where the OP still needs help