Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in one SELECT statement. What is the best practice? Any advice on building indexes?
Please use these table/column names in your answer:
This is an example of the
greatest-n-per-group problem that has appeared regularly on StackOverflow.
Here's how I usually recommend solving it:
SELECT c.*, p1.* FROM customer c JOIN purchase p1 ON (c.id = p1.customer_id) LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND (p1.date < p2.date OR p1.date = p2.date AND p1.id < p2.id)) WHERE p2.id IS NULL;
Explanation: given a row
p1, there should be no row
p2 with the same customer and a later date (or in the case of ties, a later
id). When we find that to be true, then
p1 is the most recent purchase for that customer.
Regarding indexes, I'd create a compound index in
purchase over the columns (
id). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g.
EXPLAIN on MySQL.
Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.