jeudi 22 octobre 2015

Identifying duplicates within a table: looking for query advice

So I am trying to identify duplicated contact records within an account, and looking for the best way to do this. There is a an account table, and a contact table. Below is the query I've come up with to give me what I need, but I feel like there is probably a better/more efficient way to do this, so looking for any feedback/advice. Thanks in advance!

SELECT * FROM sysdba.CONTACT a WITH(NOLOCK)
WHERE EXISTS
(
SELECT ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL FROM sysdba.CONTACT b WITH(NOLOCK)
GROUP BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL
HAVING COUNT(*) > 1
AND a.ACCOUNTID = b.ACCOUNTID AND a.FIRSTNAME = b.FIRSTNAME AND a.LASTNAME = b.LASTNAME AND a.EMAIL = b.EMAIL
)
ORDER BY ACCOUNTID, FIRSTNAME, LASTNAME, EMAIL

Here is another way I can do this, but having to use DISTINCT seems ugly..

SELECT DISTINCT a.CONTACTID, a.FIRSTNAME, a.LASTNAME, a.EMAIL FROM sysdba.CONTACT a WITH(NOLOCK)
JOIN sysdba.CONTACT b WITH(NOLOCK)
ON a.ACCOUNTID = b.ACCOUNTID AND a.FIRSTNAME = b.FIRSTNAME AND a.LASTNAME = b.LASTNAME AND a.EMAIL = b.EMAIL AND a.CONTACTID != b.CONTACTID
ORDER BY a.CONTACTID, a.FIRSTNAME, a.LASTNAME, a.EMAIL

Aucun commentaire:

Enregistrer un commentaire