An easy way to remove duplicate rows from a table in SQL Server 2008 is to use undocumented feature called
%%physloc%%. This pseudo column shows the physical location of a row.
Note that
this feature is undocumented and unsupported so use at your own risk!
A simple test-case. Create a test table:
CREATE TABLE TestTable (
Column1 varchar(1),
Column2 int
);
Add some rows with few duplicates:
INSERT INTO TestTable VALUES ('A', 1);
INSERT INTO TestTable VALUES ('A', 1); INSERT INTO TestTable VALUES ('A', 2);
INSERT INTO TestTable VALUES ('B', 1);
INSERT INTO TestTable VALUES ('B', 2);
INSERT INTO TestTable VALUES ('B', 2); INSERT INTO TestTable VALUES ('C', 2);
You can select the data to see that all seven rows are present:
SELECT *
FROM TestTable a
ORDER BY a.Column1, a.Column2;
Now let's delete the two duplicates using the
%%physloc%%:
DELETE
FROM TestTable
WHERE TestTable.%%physloc%%
NOT IN (SELECT MIN(b.%%physloc%%)
FROM TestTable b
GROUP BY b.column1, b.Column2);
And if you run the query again you'll see that only five rows remain and duplicates have been deleted.
SELECT *
FROM TestTable a
ORDER BY a.Column1, a.Column2;
For more information about
%%physloc%%, see
Physical location of a row in SQL Server[
^].
I've been a programmer since mid 80's using languages like assembler, C/C++, PL/I (mainframe environment), pascal, VB (I know, I know, no comments please) and C# and utilizing different techniques and tools.
However I'm specialized in databases and database modeling. Mostly I have used products like Oracle (from version 6), SQL Server (from version 4.2), DB2 and Solid Server (nowadays an IBM product).
For the past 10+ years my main concerns have been dealing with different business processes and how to create software to implement and improve them. At my spare time (what ever that actually means) I'm also teaching and consulting on different areas of database management, development and database oriented software design.