user90252 user90252 - 2 months ago 8
SQL Question

Constructing a table of binary values indicating the existence of entries in another table

I have several tables containing order information for different items. The customer's may appear multiple times in the different tables. The items are unique to the tables. I would like to create a new table showing all the items a customer purchased in a given year. There should be a column for each item and a binary value indicating if the customer purchased the item that year.

In other words, I would like to convert all the tables listing single item orders (e.g. customer 1 purchased item a in Nov 2007 and item c in May 2007) into yearly transactions (e.g. customer 1 has a transaction for 2007 of {a,c} or [1,0,1,0]). I want to bin the individual orders into yearly transactions so I can mine the association rules.

Minimal Working Example:

Table1 contains orders for items a and b. Table2 contains orders for items c and d.

CREATE TABLE table1
(
orderId INT,
customerId INT,
orderDate DATE,
item VARCHAR(1)
);

CREATE TABLE table2
(
orderId INT,
customerId INT,
orderDate DATE,
item VARCHAR(1)
);

INSERT INTO table1 (orderId, customerId, orderDate, item)
VALUES
('1', '1', '2007-11-11', 'a'),
('2', '2', '2008-3-20', 'b'),
('3', '3','2009-7-11', 'a');

INSERT INTO table2 (orderId, customerId, orderDate, item)
VALUES
('4', '2', '2008-1-1', 'c'),
('5', '1', '2007-5-15', 'c'),
('6', '1', '2009-2-2', 'd');


I am using a union to combine the tables because some order ID's may overlap even though the orders are distinct.

SELECT *
INTO #table3
FROM
(
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2
) a;


Here is an attempt at the solution but it is not very elegant. More importantly, it does not apply the case statements to each year as desired.

SELECT customerId,
DATEPART(YEAR, orderDate) as orderYear,
CASE
WHEN customerId IN (
SELECT DISTINCT customerId
FROM #table3
WHERE item = 'a')
THEN 1
ELSE 0
END AS itemA,
CASE
WHEN customerId IN (
SELECT DISTINCT customerId
FROM #table3
WHERE item = 'b')
THEN 1
ELSE 0
END AS itemB,
CASE
WHEN customerId IN (
SELECT DISTINCT customerId
FROM #table3
WHERE item = 'c')
THEN 1
ELSE 0
END AS itemC,
CASE
WHEN customerId IN (
SELECT DISTINCT customerId
FROM #table3
WHERE item = 'd')
THEN 1
ELSE 0
END AS itemD
FROM #table3
ORDER BY customerId, orderDate;


The desired result would look like:

CREATE TABLE desiredResult
(
customerId INT,
orderYear INT,
itemA INT,
itemB INT,
itemC INT,
itemD INT
);

INSERT INTO desiredResult (customerId, orderYear, itemA, itemB, itemC, itemD)
VALUES
('1', '2007', '1', '0', '1', '0'),
('1', '2009', '0', '0', '0', '1'),
('2', '2008', '0', '1', '1', '0'),
('3', '2009', '1', '0', '0', '0');


Is there an easier way to get the result I want? Is this something that PIVOT might be useful for?

Answer

I would do this using conditional aggregation:

SELECT customerId, OrderYear,
       MAX(CASE WHEN item = 'a' THEN 1 ELSE 0 END) as itemA,
       MAX(CASE WHEN item = 'b' THEN 1 ELSE 0 END) as itemB,
       MAX(CASE WHEN item = 'c' THEN 1 ELSE 0 END) as itemC,
       MAX(CASE WHEN item = 'd' THEN 1 ELSE 0 END) as itemD
FROM ((SELECT customerId, year(OrderDate) as OrderYear, item FROM table1
      ) union all
      (SELECT customerId, year(OrderDate) as OrderYear, item FROM table2
      )
     ) t
GROUP BY customerId, orderYear;

This also eliminates the need for temporary tables.

Comments