Ben Osborne Ben Osborne - 2 months ago 14
SQL Question

SQL Server Best Date Matches Between Data Sets

I need to populate

@firstParty
with the best matches from
@thirdParty
.

For a record to match,
@firstParty.Registered
has to be within 31 days of
@thirdParty.Registered
.

In addition, any record from
@thirdParty
cannot be associated with more than one record in
@firstParty
.

I want to do this as efficiently as possible. SQL Server version is 2008, so features introduced after that can't be used.

This sample code is simplification of the actual code, which I can't post:

declare @firstParty table
(
FirstPartyId integer identity,
Registered date,

MinThirdPartyId integer,
MinThirdPartyRegistered date

);

insert into @firstParty (Registered)
values
('1/1/2017'), ('2/1/2017'), ('3/1/2017'), ('4/1/2017'), ('5/1/2017'), ('6/1/2017');

/*
dates in @firstParty and @thirdParty are not guaranteed to be unique
in all scenarios
*/

declare @thirdParty table
(
ThirdPartyId integer identity,
Registered date
);

insert into @thirdParty (Registered)
values
('03/02/2017'), ('04/30/2017');


declare @x integer = 1;
while @x <= (select max(FirstPartyId) from @firstParty) begin

declare @MinRegistered date = null;

--get minimum third party date within 31 days of registered date, that hasn't been used
select
@MinRegistered = min(tp.Registered)
from
@firstParty fp
join @thirdParty tp on
fp.Registered between dateadd(d, -31, tp.Registered) and dateadd(d, 31, tp.Registered)
left join @firstParty used on tp.ThirdPartyId = used.MinThirdPartyId
where
fp.FirstPartyId = @x
and used.MinThirdPartyId is null;

declare @MinThirdPartyId integer = null;

--get earliest ID of third party record with @MinRegistered
select top 1
@MinThirdPartyId = tp.ThirdPartyId
from
@firstParty fp
join @thirdParty tp on
tp.Registered = @MinRegistered
left join @firstParty used on tp.ThirdPartyId = used.MinThirdPartyId
where
fp.FirstPartyId = @x
and used.MinThirdPartyId is null
order by
tp.Registered,
tp.ThirdPartyId;


update @firstParty
set
MinThirdPartyId = @MinThirdPartyId,
MinThirdPartyRegistered = @MinRegistered
where
FirstPartyId = @x;


set @x = @x + 1;

end;


select
fp.FirstPartyId,
fp.Registered,
fp.MinThirdPartyId,
fp.MinThirdPartyRegistered
from
@firstParty fp;


Here are the results that I want:

FirstPartyId Registered MinThirdPartyId MinThirdPartyRegistered
------------ ---------- --------------- -----------------------
1 2017-01-01 NULL NULL
2 2017-02-01 NULL NULL
3 2017-03-01 1 2017-03-02
4 2017-04-01 NULL NULL
5 2017-05-01 2 2017-04-30
6 2017-06-01 NULL NULL


An approach that first populates all matches and then removes the non-optimal matches will not work, because if a @firstParty's optimal @thirdParty's record is removed, there could be a different record in @thirdParty that is still an acceptable match.

Answer Source

Perhaps something like this?

Example

Select A.FirstPartyId
      ,A.Registered
      ,MinThirdPartyId = B.ThirdPartyId
      ,MinThirdPartyRegistered = B.Registered
 From  @firstParty A
 Left  Join (
                Select B1.*
                      ,MinPartyID =B2.FirstPartyID
                 From  @thirdParty B1
                 Cross Apply (
                                Select Top 1 with ties *
                                 From @firstParty
                                 Where abs(DateDiff(DAY,B1.Registered,Registered))<=31
                                 Order By abs(DateDiff(DAY,B1.Registered,Registered)) 
                             ) B2
            ) B
  on (B.MinPartyID=A.FirstPartyId)

Returns

enter image description here