Select duplicate records

Question

0.00/5 (No votes)

See more:

Hi,

I have a query below

declare @tbl table (id int identity(1,1),sname varchar(10),marks int)
insert into @tbl values('A',10),('B',10),('A',10),('B',40),('C',10),('D',15),('A',9)
select row_number()over(partition by sname,marks order by marks)sno,sname,marks from @tbl

my result is:
sno sname marks
1 A 9
1 A 10
2 A 10
1 B 10
1 B 40
1 C 10
1 D 15

But I want to only second and third rows as these are duplicate records. i want all duplicate records irrespective of repeatation.

Please suggest

Thanks

What I have tried:

duplicates records elimination

Posted 25-Jul-18 21:00pm

Member 13867163

Updated 25-Jul-18 23:07pm

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CHill60 · Answer 1 · 2018-07-25T23:08:00

Given that sno is always going to be 1 (because you are partitioning by sname AND marks) you could simplify this to

SQL

select distinct 1 as sno,sname, marks from @tbl

The key point being that it is your introduction of the row_number() that is actually causing the issue.

But I suspect you want sno to be something meaningful (correct me if I'm wrong)

If you want it to show the number of the mark per name then use

SQL

select row_number() over (partition by sname order by sname,marks) as sno, sname, marks
FROM (select distinct 1 as sno,sname, marks from @tbl) A

which will give you

sno	sname	mark
1	A	9
2	A	10
1	B	10
2	B	40
1	C	10
1	D	15

If you want to give each row a number then remove the partition altogether

SQL

select row_number() over (order by sname,marks) as sno, sname, marks
FROM (select distinct sname, marks from @tbl) A

sno	sname	mark
1	A	9
2	A	10
3	B	10
4	B	40
5	C	10
6	D	15

------------------ EDIT AFTER OP COMMENT ---------------

To only get the lines that are duplicated there are two ways you could try:

1. To get the sname and marks and the number of duplications try this

SQL

select sname, marks, max(sno) as numberOfDups from 
(select row_number() over (partition by sname, marks order by sname,marks) as sno, sname, marks
FROM  @tbl) A
where sno > 1
group by sname, marks

2. To get the full list of items that have duplicates try this:

SQL

;with cte as
(
	select row_number() over (partition by sname, marks order by sname,marks) as sno, sname, marks
	from @tbl
)
select * 
from cte 
inner join (select sname, marks, max(sno) as numberOfDup from cte where sno > 1 group by sname, marks) B on cte.sname = B.sname AND cte.marks = B.marks

The key is the INNER JOIN on a sub-query looking at the CTE a second time. It's not just a simple "give me the rows where sno > 1" - the join ensures that while we don't see the non-duplicated rows we do see all the rows where there has been a duplicate - i.e. A 10 where sno - 1 AND 2, not just 2

------------------ EDIT AFTER ANOTHER OP COMMENT ---------------

Introducing other columns which must be taken into account - the joins take "much time for the on conditions

Test data:

SQL

declare @tbl table (id int identity(1,1),sname varchar(10),marks int, other int)
insert into @tbl values('A',10,1),('B',10,1),('A',10,1),('B',40,1),('C',10,2),('D',15,2),('A',9,1), ('A',10,2)

My original solution would return

1	A	10	2	3
2	A	10	1	3
3	A	10	1	3

Which is incorrect when taking into account the other column.

We can use Checksum [^] to "combine" all of the columns that are relevent e.g.

SQL

;with cte1 as 
(
	select checksum(sname, marks, other) as sno, sname, marks, other from @tbl
)
select sname, marks, other
from cte1
inner join (select checksum(sname, marks, other) as sno from @tbl group by checksum(sname, marks, other) having count(*) > 1) A on cte1.sno = A.sno

results

A	10	1
A	10	1

Member 11621026 · Answer 2 · 2018-07-25T21:13:00

declare @tbl table (id int identity(1,1),sname varchar(10),marks int)
insert into @tbl values('A',10),('B',10),('A',10),('B',40),('C',10),('D',15),('A',9)
select * from (
select row_number()over(partition by sname,marks order by marks)sno,sname,marks from @tbl) as a where sno=1

declare @tbl table (id int identity(1,1),sname varchar(10),marks int)
insert into @tbl values('A',10),('B',10),('A',10),('B',40),('C',10),('D',15),('A',9)

;with cte as 
(
select row_number()over(partition by sname,marks order by marks)sno,sname,marks from @tbl)
select * from cte where sno=1

Select duplicate records

2 solutions

Solution 2

Solution 1

Add your solution here

Preview 0