Remove dups in table

Question

0.00/5 (No votes)

See more:

I thought I had this but turns out it's dropping some it shouldn't.

I'm migrating data from an old table into a new table and the tables are dissimilar so am having to be a little creative since I'm SQL illiterate.

The table is defined as;

C#

public partial class User
{
    public int UserID { get; set; }
    public string Name { get; set; }
    public string ImageFile { get; set; }
    public string Password { get; set; }
    public string Email { get; set; }
    public string WebSite { get; set; }
    public string DisplayName { get; set; }
    public bool CanComment { get; set; }
    public bool IsActive { get; set; }
    public System.DateTime CreateDate { get; set; }
    public string Custom { get; set; }
}

And what I'm trying to do is delete all duplicate users using email. This is the query that I thought I had working but doesn't seem to be doing what I want?

SQL

delete from [User] where Email in
(select email from [User] group by Email having Count(Email) > 1)

Thanks for any help given

Posted 11-Feb-15 19:08pm

Mike Hankey

Add a Solution

Comments

Maciej Los 12-Feb-15 1:52am

Mike, is it SQL Server, MySQL, PostgreSQL, Access database?

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Maciej Los · Accepted Answer · 2015-02-11T19:59:00

Solution 1

I'd suggest to write query as follow:

SQL

SELECT u.*, ROW_NUMBER() OVER(ORDER BY Email, UserID) AS RowNo
FROM [User] as u

This should produce a recordset with RowNumbers.
To delete duplicates, use this:

SQL

DELETE
FROM (
    SELECT u.*, ROW_NUMBER() OVER(ORDER BY Email, UserID) AS RowNo
    FROM [User] as u
) AS t1
WHERE t1.RowNo>1

Another way is to use:

SQL

DELETE
FROM (
    SELECT u.*, ROW_NUMBER() OVER(PARTITION BY Email ORDER BY UserID) AS RowNo
    FROM [User] as u
) AS t1
WHERE t1.RowNo>1

Posted 11-Feb-15 19:59pm

Maciej Los

Updated 11-Feb-15 20:11pm

v2

Comments

Mike Hankey 12-Feb-15 2:06am

Great stuff thanks Maciej

Maciej Los 12-Feb-15 2:09am

You're welcome, Mike ;)
Cheers!

Wendelius 12-Feb-15 15:26pm

Good choices!

Maciej Los 12-Feb-15 15:28pm

Thank you, Mika ;)

Wendelius · Accepted Answer · 2015-02-12T09:16:00

Solution 2

Just to throw you one more option, if UserId is unique then simply

SQL

DELETE FROM [User] a
WHERE EXISTS (SELECT 1
              FROM [User] b
              WHERE b.email = a.email
              AND   b.userid > a.userid)

or you can use %%physloc%% if no key is present. For more info, see How to remove duplicate rows in SQL Server 2008 when no key is present[^]

Posted 12-Feb-15 9:16am

Wendelius

Comments

Maciej Los 12-Feb-15 15:25pm

2 times 5!

Mike Hankey 12-Feb-15 15:27pm

Thanks for the extra hand Mike I'll keep that one in mind also for the future.

Wendelius 12-Feb-15 15:33pm

Glad to help you :)