Delete Duplicates from a table without primary key

On a friday night right after the work I got a call from a developer requesting me to delete duplicate copies of data from a couple of tables. Though I was quickly not able to do it but with in a half hour time I figured out how to do it. Below are the steps:

1)First check which records are duplicate using the below query

SELECT t.* FROM tablename t JOIN (SELECT t.columnval FROM tablename t GROUP BY t.columnval HAVING COUNT(*) > 1) x ON x.columnval = t.columnval

2) Create a temp table (I created the table in the same database) with the same structure as the original table (from which data has to be deleted)  but with an additional column which is an identity column

3)Use the SQL Import data to copy over the data from original table to the temp table

4)Delete the data from the original table

5)On the temp table run the below query:

SELECT min(ID), columnval1, columnval2, Count(*) FROM tablename GROUP BY columnval1, columnval2

6)Now delete the duplicate data from the temp table by running the below query:

DELETE FROM tablename WHERE ID NOT IN (SELECT min(ID) FROM tablename GROUP BY columnval1,columnval2)

7) And then drop the identity column from the temp table

8)Finally export the data from temp table to the original table.

Advertisements
This entry was posted in SQL Server DBA Stuff and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s