dimanche 21 juin 2015

SQL SERVER LEFT JOIN on multiple columns with unwanted duplicates

I have been running in circles with a query that is driving me nuts.

The background:
I have two tables, and unfortunately, both have duplicate records. (Dealing with activity logs if that puts it into perspective). Each table comes from a different system and I am trying to join the data together to get a sudo full picture (I realize that I won't get a perfect view because there is no "event key" shared between the two systems; I am attempting to match on a composite of meta data).

Here is what I am working with:

    Table1 t1
    ------------
    JobID      CustID      Name      ActionDate      IsDuplicate
    12345      11111      Ryan      1/1/2015 01:20:20      False
    12345      11112      Bob      1/1/2015 02:10:20      False
    12345      11111      Ryan      1/1/2015 04:15:35      True
    12346      11113      Jim      1/1/2015 05:10:40      False
    12346      11114      Jeb      1/1/2015 06:10:40      False
    12346      11111      Ryan      1/1/2015 07:10:30      False

    Table2 t2
    ------------
    ResponseID      CustID      ActionDate      Browser
    11123      10110      12/1/2014 23:32:15      IE
    12345      11111      1/1/2015 03:20:20      IE
    12345      11112      1/1/2015 05:10:20      Firefox
    12345      11111      1/1/2015 06:15:35      Firefox
    12346      11113      1/1/2015 07:10:40      Chrome
    12346      11114      1/1/2015 08:10:40      Chrome
    12346      11111      1/1/2015 10:10:30      Safari
    12213      11123      2/1/2015 01:10:30      Chrome

Please note a few things:
- JobID and ResponseID are the same thing
- JobID and ResponseID are indicators of an event on the site (people are responding to an event)
- Action date does not match (system 2 has about an inconsistent 2 hour delay on it but never more that 3 hours delay)
- Note Table2 doesnt have a duplicate flag
- table 1 (~2,000 records) is significantly smaller than table 2 (~16,000 records)
- Note Cust 11111 is bopping around on browsers, taking the same action twice on job 12345 at different times and only taking action once on job 12346

What I am looking for:

    Result (ideal)
    ------------
    t1.JobID      t1.CustID      t1.Name      t1.ActionDate      t2.Browser
    12345      11111      Ryan      1/1/2015 01:20:20      IE
    12345      11112      Bob      1/1/2015 02:10:20      Firefox
    12345      11111      Ryan      1/1/2015 04:15:35      Firefox
    12346      11113      Jim      1/1/2015 05:10:40      Chrome
    12346      11114      Jeb      1/1/2015 06:10:40      Chrome
    12346      11111      Ryan      1/1/2015 07:10:30      Safari

Note that I JUST want matches for records in Table1. I am getting tons of duplicates because of the join...Which is frustrating.

Here is what I have so far (which I can humbly can say; isn't really close):

    SELECT
    t1.JobID,
    t1.CustID,
    t1.Name,
    t1.ActionDate,
    t2.Browser
    FROM
    Table1 t1
    LEFT OUTER JOIN
    Table2 t2
    ON
    t1.JobID=t2.ResponseID AND 
    t1.CustID=t2.CustID AND
    DATEPART(dd,t1.ActionDate)=DATEPART(dd,t2.ActionDate)

Any help with this would be greatly appreciated.

Aucun commentaire:

Enregistrer un commentaire