mbourgon mbourgon - 9 months ago 28
SQL Question

Using a (Recursive?) CTE + Window Functions to zero out sales orders?

I am trying to use a recursive CTE + window functions to find the last outcome of a series of buy/sell orders.

First, here's some nomenclature:


  • field_id is the store's ID.

  • Field_number is an order number, but can be reused by the same person

  • Field_date is the date of the initial order.

  • Field_inserted is when this specific transaction occcurred.

  • Field_sale is whether we bought or returned it.



Unfortunately, because of the way the systems work, I do NOT get the cost when an item is returned, so figuring out the last outcome for an order is complicated (did we wind up selling any). I need to match the purchase with the sale, Which normally works pretty well. However, there are cases such as below when it fails, and I'm trying to find a way to do this in one pass, possibly using a recursive CTE.

Here's some code.

DECLARE @tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO @tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Retu'),
(1, 100, '20170311','20170311 01:02:00', 'Buy'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(1, 100, '20170311','20170311 01:02:01', 'buy'),
(2, 100, '20170311','20170311 01:03:00', 'REtu'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');


Now to remove the buys that were then returned. The ISNULL is because I'm the NOT IN will ignore all the rows that have NULL for the _lead/_lag values.

WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM @tablea
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)


And I felt pretty smug and thought I had it. However, that's the simple case. Buy, Return, Buy, Return. Let's try another case, Buy Buy Return Return, which is still valid, but obviously would result in a net of 0..

DECLARE @tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO @tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Retu'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');


WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM @tablea
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'sell')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'sell' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)


When you do this, though, you realize that it found direct matches, but now there's still a Buy/Return pair, and I'd like to cancel that out.

It's at this point I'm stuck. I've done Recursive CTEs before, but for whatever reason I can't figure out how to recurse and make it cancel out 1/1/100 and 4/1/100. All I've managed to do is have it choke on the recursion.

DECLARE @tablea TABLE (field_id int, field_number CHAR(3), field_date datetime, field_inserted DATETIME, field_sale varchar(4))
INSERT INTO @tablea
VALUES
(1, 100, '20170311','20170311 01:00:00', 'Buy'),
(1, 100, '20170311','20170311 01:01:00', 'Buy'),
(1, 100, '20170311','20170311 01:02:00', 'Retu'),
(1, 100, '20170311','20170311 01:03:00', 'Retu'),
(2, 100, '20170311','20170311 01:03:00', 'Buy'),
(1, 110, '20170311','20170311 01:03:00', 'Buy');

WITH cte AS
(SELECT
ROW_NUMBER() OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS row_num,
field_id,
field_number,
field_date,
field_sale,
field_inserted,
lead(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lead,
lag(field_sale) OVER (PARTITION BY field_id, field_number, field_date ORDER BY field_inserted) AS field_sale_lag
FROM @tablea
--)
--SELECT * FROM cte
--WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
--AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)

UNION ALL
SELECT
ROW_NUMBER() OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS row_num,
cte.field_id,
cte.field_number,
cte.field_date,
cte.field_sale,
cte.field_inserted,
lead(cte.field_sale) OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS field_sale_lead,
lag(cte.field_sale) OVER (PARTITION BY cte.field_id, cte.field_number, cte.field_date ORDER BY cte.field_inserted) AS field_sale_lag
FROM @tablea INNER JOIN cte ON cte.field_date = [@tablea].field_date AND cte.field_id = [@tablea].field_id AND cte.field_number = [@tablea].field_number
)
SELECT * FROM cte
WHERE NOT (cte.field_sale = 'Buy' AND ISNULL(field_sale_lead,'') = 'Retu')--AND field_sale_lead IS NOT null)
AND NOT (cte.field_sale = 'Retu' AND ISNULL(field_sale_lag,'') = 'buy' )--AND field_sale_lag IS NOT NULL)

Answer Source

We can tackle this without loops or recursion by using a common table expression and row_number() like so:

If I am understanding your question correctly, you want to remove sales that have been returned , and for each 'retu' it should remove the most recent 'buy'.

First we will add id using row_number() to our rowset so we can uniquely identify our rows.

Next, we add br_rn (short for Buy/Return RowNumber) partitioned by field_id, field_number, field_date, but we will also add field_sale to the partition; and we will order it by field_inserted desc. This will let us match each 'retu' with the most recent 'buy', and once we can do that, we can eliminate all of the pairs with not exists():

;with cte as (
  select 
      id = row_number() over (
        order by field_id, field_number, field_date, field_inserted asc
        ) 
    , field_id
    , field_number
    , field_date 
    , field_inserted 
    , field_sale
    , br_rn = row_number() over (
        partition by field_id, field_number, field_date, field_sale
        order by field_inserted desc
        ) 
  from @tablea
)
select 
    id 
  , field_number
  , field_date
  , field_inserted
  , field_sale
from cte
where not exists (
  select 1
  from cte as i
  where i.field_id = cte.field_id
    and i.field_number = cte.field_number
    and i.br_rn = cte.br_rn
    and i.id <> cte.id
    )
order by id

rextester demo: http://rextester.com/TKXOC61533

For this input:

  (1, 100, '20170311','20170311 01:00:00', 'Buy') 
, (1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Retu')
, (1, 100, '20170311','20170311 01:03:00', 'Retu')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy');

returns:

+----+--------------+------------+---------------------+------------+
| id | field_number | field_date |   field_inserted    | field_sale |
+----+--------------+------------+---------------------+------------+
|  5 |          110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
|  6 |          100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
+----+--------------+------------+---------------------+------------+

and for this input:

  (1, 100, '20170311','20170311 01:01:00', 'Buy')
, (1, 100, '20170311','20170311 01:02:00', 'Buy')
, (1, 100, '20170311','20170311 01:03:00', 'Buy') 
, (1, 100, '20170311','20170311 01:04:00', 'Retu')
, (1, 100, '20170311','20170311 01:05:00', 'Buy') 
, (1, 100, '20170311','20170311 01:06:00', 'Retu')
, (1, 100, '20170311','20170311 01:07:00', 'Retu')
, (2, 100, '20170311','20170311 01:03:00', 'Buy')
, (1, 110, '20170311','20170311 01:03:00', 'Buy');

returns:

+----+--------------+------------+---------------------+------------+
| id | field_number | field_date |   field_inserted    | field_sale |
+----+--------------+------------+---------------------+------------+
|  1 |          100 | 2017-03-11 | 2017-03-11 01:01:00 | Buy        |
|  8 |          110 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
|  9 |          100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |
+----+--------------+------------+---------------------+------------+

It may help illustrate what we are doing to take a look what the cte is returning before we eliminate any pairs.

Looking at just the set that needs filtering, before we filter it:

+----+----------+--------------+------------+---------------------+------------+-------+
| id | field_id | field_number | field_date |   field_inserted    | field_sale | br_rn |
+----+----------+--------------+------------+---------------------+------------+-------+
|  1 |        1 |          100 | 2017-03-11 | 2017-03-11 01:01:00 | Buy        |     4 |
|  2 |        1 |          100 | 2017-03-11 | 2017-03-11 01:02:00 | Buy        |     3 |
|  3 |        1 |          100 | 2017-03-11 | 2017-03-11 01:03:00 | Buy        |     2 |
|  4 |        1 |          100 | 2017-03-11 | 2017-03-11 01:04:00 | Retu       |     3 |
|  5 |        1 |          100 | 2017-03-11 | 2017-03-11 01:05:00 | Buy        |     1 |
|  6 |        1 |          100 | 2017-03-11 | 2017-03-11 01:06:00 | Retu       |     2 |
|  7 |        1 |          100 | 2017-03-11 | 2017-03-11 01:07:00 | Retu       |     1 |
+----+----------+--------------+------------+---------------------+------------+-------+

Looking at it like this, we can easily see that the 'buy' order id 1 has a br_rn of 4 and there is no associated 'retu'.