H. Corbett H. Corbett - 5 months ago 11
SQL Question

Query for customers who have made purchases on a common but discontinuous set of dates

RDMS: PostgreSQL 9.5.3

I have a table ('activity') of the following form:

customerID | date | purchaseID
-----------------------------------------
1 | 2016-01-01 | 1
2 | 2016-01-01 | 2
3 | 2016-01-01 | 3
2 | 2016-01-02 | 4
1 | 2016-01-03 | 5
2 | 2016-01-03 | 6
3 | 2016-01-03 | 7
1 | 2016-01-04 | 8
2 | 2016-01-04 | 9
3 | 2016-01-05 | 10


From this table, I want to find all customers who have made purchases on the same dates as customerID 1. The customers purchase history needs to completely overlap with customerID 1, but not necessarily be limited to it -- extra purchases outside of the dates are fine, but should not be returned in the final results.

The result on the above data should be:

customerID | date | purchaseID
-----------------------------------------
2 | 2016-01-01 | 2
2 | 2016-01-02 | 5
2 | 2016-01-03 | 8


At the moment, I'm solving this through a loop in the application code and then dropping all NULL results, so the actual SQL is:

SELECT customerID,
date,
purchaseID
FROM activity
WHERE customerID <> 1
AND date = %date%


where %date% is the is an iteration variable through all of the dates that customerID 1 has made purchases. This isn't an elegant solution, and extremely slow for large numbers of purchases (millions) or customers (tens of thousands). Any suggestions would be welcome.

Thanks for reading--

Answer

One method is to use a self-join and aggregation:

select a.customerid
from activity a join 
     activity a1
     on a1.date = a.date and a1.customerid = 1
where a1.customerid <> a.customerid
group by a.customerID
having count(distinct a1.date) = (select count(distinct date) from activity where customerID = 1)

If you want the original records, you can use:

select a.*
from activity a
where a.customerId in (select a.customerid
                       from activity a join 
                            activity a1
                            on a1.date = a.date and a1.customerid = 1
                       where a1.customerid <> a.customerid
                       group by a.customerID
                       having count(distinct a1.date) = (select count(distinct date) from activity where customerID = 1)
                      );
Comments