JorgeDS JorgeDS - 4 months ago 11
SQL Question

Calculating customer's trips based on transactions dates

Basic T-SQL user here. I'm having problems trying to complete a task and would appreciate some guidance. Apologies in advance for any errors as English is not my mother tongue.

I have a table with a lot of transactions, for the sake of simplicity let's say that I only have two columns: CUSTOMER_ID, which is my customer and DATE which is the date of the transaction.

My customers make a lot of transactions while they're in town but then they can spend weeks, months or even years before coming back and start making transactions again. I would like to somehow identify each one of those "Trips" and group the transactions involved, then I'd like to do thins like calculate trip duration, number of transactions, etc.

I'd like to consider a Trip as any new transaction occurring after an IDLE period of 10 days.

Let me try to better explain my request by using some simple example:

This is my transactions table:

+-------------+------------+
| CUSTOMER_ID | DATE |
+-------------+------------+
| JHON | 01-01-2016 |
| JHON | 01-02-2016 |
| PEDRO | 01-02-2016 |
| JHON | 01-05-2016 |
| MIKE | 01-05-2016 |
| MIKE | 01-10-2016 |
| JHON | 01-07-2016 |
| … | … |
| JHON | 02-15-2016 |
| JHON | 02-18-2016 |
| MIKE | 02-19-2016 |
| MIKE | 02-19-2016 |
+-------------+------------+


So far I've made this query to enumerate the customer's visits:

SELECT
CUSTOMER_ID,
DATE,
ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID ORDER BY DATE) as VISIT_NUM

FROM
TRANSACTIONS
WHERE
CUSTOMER_ID IN ('JHON','MIKE','PEDRO')


Running that query would give a result similar to this:

+-------------+------------+-----------+
| CUSTOMER_ID | DATE | VISIT_NUM |
+-------------+------------+-----------+
| JHON | 01-01-2016 | 1 |
| JHON | 01-02-2016 | 2 |
| JHON | 01-07-2016 | 3 |
| JHON | 02-15-2016 | 4 |
| JHON | 02-18-2016 | 5 |
| MIKE | 01-05-2016 | 1 |
| MIKE | 01-10-2016 | 2 |
| MIKE | 02-19-2016 | 3 |
| MIKE | 02-19-2016 | 4 |
| PEDRO | 01-02-2016 | 1 |
+-------------+------------+-----------+


Now comes the tricky part: I need somehow to create a query that (maybe using the above query as a previous step) show me the customer with they trip info, continuing with the example the ideal result would be like this:

+-------------+----------+---------------+-------------+---------------+--------------+
| CUSTOMER_ID | TRIP_NUM | TRIP_START_DT | TRIP_END_DT | TRIP_DURATION | TRANSACTIONS |
+-------------+----------+---------------+-------------+---------------+--------------+
| JHON | 1 | 01-01-2016 | 01-07-2016 | 7 | 3 |
| JHON | 2 | 02-15-2016 | 02-18-2016 | 3 | 2 |
| MIKE | 1 | 01-05-2016 | 01-10-2016 | 5 | 2 |
| MIKE | 2 | 02-19-2016 | 02-19-2016 | 1 | 2 |
| PEDRO | 1 | 01-02-2016 | 01-02-2016 | 1 | 1 |
+-------------+----------+---------------+-------------+---------------+--------------+


As you can see, Mr. Jhon came 3 times during the month of January and came back again in February. As more than 10 days passed from his last transaction in January, I'd like to consider his new set of transactions as a new "trip" for him. Mike also had some activity in January, and came back in February too, in his second trip he made two transactions in the same day, I'd like to account that too. If a customer only came a single day and had some activity (as the case of Mr. Pedro) I'd also like to consider that single-day, single-transaction record as a trip record.

I would greatly appreciate any light on this, I honestly have no idea on how to proceed (I've been reading about cursors but it seems like dark magic at this point, cant figure out a way to implement them on this).

Apologies again for any grammatical errors and any possible omissions on my part. I'd further clarify anything if necessary.

Answer

I found the perfect answer elsewhere. All credit goes to to the Reddit user nvarscar for the amazing solution!

I'll just copy his/her answer below, in case someone else need it in the future:

You may use a window function feature, which helps you to aggregate rows between current row and all preceding ones. The code looks too long, but at least you will see the steps taken.

DECLARE @t TABLE 
    ([CUSTOMER_ID] varchar(5), [DATE] datetime)
;

INSERT INTO @t
    ([CUSTOMER_ID], [DATE])
VALUES
    ('JHON', '2016-01-01 00:00:00'),
    ('JHON', '2016-01-02 00:00:00'),
    ('PEDRO', '2016-01-02 00:00:00'),
    ('JHON', '2016-01-05 00:00:00'),
    ('MIKE', '2016-01-05 00:00:00'),
    ('MIKE', '2016-01-10 00:00:00'),
    ('JHON', '2016-01-07 00:00:00'),
    ('JHON', '2016-02-15 00:00:00'),
    ('JHON', '2016-02-18 00:00:00'),
    ('MIKE', '2016-02-19 00:00:00'),
    ('MIKE', '2016-02-19 00:00:00'),
    ('JHON', '2016-02-01 00:00:00'),
    ('JHON', '2016-02-02 00:00:00'),
    ('PEDRO', '2016-03-02 00:00:00'),
    ('JHON', '2016-03-05 00:00:00'),
    ('MIKE', '2016-05-05 00:00:00'),
    ('MIKE', '2016-05-10 00:00:00'),
    ('JHON', '2016-03-07 00:00:00'),
    ('JHON', '2016-04-15 00:00:00'),
    ('JHON', '2016-04-18 00:00:00'),
    ('MIKE', '2016-06-19 00:00:00'),
    ('MIKE', '2016-06-19 00:00:00')
;


WITH CTE1 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, COUNT(*) AS Transactions
FROM @t
GROUP BY 
  [CUSTOMER_ID]
, [DATE]
)
, CTE2 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, DATEDIFF(day,LAG([DATE]) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE]),[DATE]) AS DaysSinceLastTransaction
FROM CTE1
)
, CTE3 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, CASE WHEN DaysSinceLastTransaction > 10 THEN 1 ELSE 0 END AS TripTag --Here we set the idle tag
FROM CTE2
)
, CTE4 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, SUM(TripTag) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TripTag
FROM CTE3
)
SELECT 
  [CUSTOMER_ID]
, TripTag+1 AS TripNumber
, MIN ([DATE]) AS TripStartDate
, MAX ([DATE]) AS TripEndDate
, DATEDIFF(day, MIN ([DATE]), MAX ([DATE])) AS TripDuration
, SUM(Transactions) AS Transactions
FROM CTE4
GROUP BY [CUSTOMER_ID], TripTag