Time after time I’ll get a table that has duplicates in it and I have to clear out all the duplicate entries while leaving a distinct set of records. There’s a couple of ways to solve the problem. Today I want to take you through my standard fix.
- Script the source table out to a temporary table.
- INSERT the DISTINCT records into the temporary table.
- TRUNCATE the original table.
- INSERT the records back into the source table from the temporary table.
Let’s run through these steps in AdventureWorks.
First, we’re going to set up a table that will be filled with duplicate records.
CREATE TABLE dbo.Contact( Title NVARCHAR(8) , FirstName NVARCHAR(50) , MiddleName NVARCHAR(50) , LastName NVARCHAR(50) , Suffix NVARCHAR(10) )
Next, we’re going to fill that table with two copies of all contacts in our Person.Contact table.
INSERT INTO dbo.Contact SELECT Title, FirstName, MiddleName, LastName, Suffix FROM Person.Contact UNION ALL SELECT Title, FirstName, MiddleName, LastName, Suffix FROM Person.Contact
OK, in case you’re not familiar with how to check for dupes, here’s a quick way to check. Select all the columns that should be DISTINCT, then GROUP by those columns, and add the HAVING COUNT(*) > 1 clause. This will show you any records that are duplicated in your table.
SELECT Title, FirstName, MiddleName, LastName, Suffix FROM dbo.Contact GROUP BY Title, FirstName, MiddleName, LastName, Suffix HAVING COUNT(*) > 1
I’m showing 19,907 records have been duplicated. Your mileage may vary.
Now, let’s create our temp table. My table of choice is a table variable. No need to create a table that will last longer than this operation, and since this is a one shot deal, I’m not very worried about optimizing the operation with indexes or anything too fancy.
DECLARE @Contact TABLE ( Title NVARCHAR(8) , FirstName NVARCHAR(50) , MiddleName NVARCHAR(50) , LastName NVARCHAR(50) , Suffix NVARCHAR(10) )
Now, we INSERT the DISTINCT records into our table variable.
INSERT INTO @Contact SELECT DISTINCT Title, FirstName, MiddleName, LastName, Suffix FROM dbo.Contact
Then we TRUNCATE the source table.
TRUNCATE TABLE dbo.Contact
Now sometimes you’ll have to keep records online while removing the duplicates, in which case you’ll have to come up with a way to delete the records that you keep records “online” while doing this delete. There are some ways to do it… can you think of any?
Finally, we want to insert the records back into the table.
INSERT INTO dbo.Contact SELECT Title, FirstName, MiddleName, LastName, Suffix FROM @contact
Double check to make sure you got all the duplicates cleared up.
SELECT Title, FirstName, MiddleName, LastName, Suffix FROM dbo.Contact GROUP BY Title, FirstName, MiddleName, LastName, Suffix HAVING COUNT(*) > 1
And that’s how you get rid of duplicates. I’d highly suggest applying a DISTINCT constraint to the table. I’d also suggest you consider a PRIMARY KEY, that way if you need to remove duplicates, you could remove them by ID, which would help you out if you needed to deal with duplicates while leaving your records “online”.
If you have any questions, please feel free to reach out. I’m here to help!