Click here to Skip to main content
15,889,595 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm trying to use the AdventureWorks database to generate some test data for another database. When I execute the following query it produces 24 duplicate primary keys. So the query will not run unless I turn off the PK on the target db table. The source db does not contain these duplicates.

SQL
INSERT INTO [TestDb] (TestId, Name, Location)
SELECT
  p.[BusinessEntityID],
  p.[FirstName]+ISNULL(p.[MiddleName], '')+p.[LastName],
  a.[City]
FROM [AdventureWorks2008R2].[Person].[Person] AS p
INNER JOIN [AdventureWorks2008R2].[Person].[BusinessEntityAddress] AS b
ON p.BusinessEntityID = b.BusinessEntityID
INNER JOIN [AdventureWorks2008R2].[Person].[Address] AS a
ON a.AddressID = b.AddressID;


Any suggestion as to why the duplicates are being created? Is there a better way to generate test data? What is the best way to delete duplicate rows, given that I don't care if I lose information? Regards, Dave
Posted

If you want unique keys, as in a primary, for the table then don't set it. Don't insert the primary key
 
Share this answer
 
Comments
david006 8-Nov-11 23:08pm    
Thanks Mark, I had thought of that but wanted to keep the PK of test data in sync with source data. I have worked around that requirement and using auto-generated PK in test table.
It is still a strange anomaly though. With no duplications I end up with the exact number of rows as the set with duplicates. Must be a bug.
You are getting duplicates because in this query BusinessEntityID is not unique.

BusinessEntityID is unique for every Person. But every person may have more than one Address.
By joining together Person with BusinessEntityAdress you get a row for every Address the person has. So if a Person has two Addresses you get two rows.

BusinessEntityID  (No column name)  City
2996                  AmandaSCook         Everett
2996                  AmandaSCook         Everett
2997                  AmandaLMorgan         San Gabriel


In this example AmandaSCook has two entries in BusinessEntityAddress.
But you still take Person.BusinessEntityID as key to insert into TestDB. That's where your duplicates and PK violations come from.

Given you just want testdata and don't care for the information, try this:
SQL
select
  p.[BusinessEntityID],
  p.[FirstName]+ISNULL(p.[MiddleName], '')+p.[LastName],
  a.[City]
FROM [AdventureWorks2008R2].[Person].[Person] AS p
INNER JOIN [AdventureWorks2008R2].[Person].[BusinessEntityAddress] AS b
ON p.BusinessEntityID = b.BusinessEntityID
INNER JOIN [AdventureWorks2008R2].[Person].[Address] AS a
ON a.AddressID = b.AddressID
group by  p.[BusinessEntityID],
  p.[FirstName]+ISNULL(p.[MiddleName], '')+p.[LastName],
  a.[City]
having count(*) = 1
 
Share this answer
 
Comments
david006 10-Nov-11 5:49am    
Yes, I can't believe I missed that. Thank you.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900