I'm trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time.
Here's the query I have so far:
SELECT p.name, e.date,
sum(sp.payout) OVER (ORDER BY e.date)
- sum(s.buyin) OVER (ORDER BY e.date) AS "Profit/Loss"
FROM result r
JOIN game g ON r.game_id = g.game_id
JOIN event e ON g.event_id = e.event_id
JOIN structure s ON g.structure_id = s.structure_id
JOIN structure_payout sp ON g.structure_id = sp.structure_id
AND r.position = sp.position
JOIN player p ON r.player_id = p.player_id
WHERE p.player_id = 17
GROUP BY p.name, e.date, e.event_id, sp.payout, s.buyin
ORDER BY p.name, e.date ASC
GROUP BY p.name, e.date, e.event_id
column "sp.payout" must appear in the GROUP BY clause or be used in an
You are not, in fact, using aggregate functions. You are using window functions. That's why PostgreSQL demands
s.buyin to be included in the
GROUP BY clause.
By appending an
OVER clause, the aggregate function
sum() is turned into a window function, which aggregates values per partition while keeping all rows.
You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. Now I can remove
s.buyin from the
GROUP BY clause and get one row per
SELECT p.name , e.event_id , e.date , sum(sum(sp.payout)) OVER w - sum(sum(s.buyin )) OVER w AS "Profit/Loss" FROM player p JOIN result r ON r.player_id = p.player_id JOIN game g ON g.game_id = r.game_id JOIN event e ON e.event_id = g.event_id JOIN structure s ON s.structure_id = g.structure_id JOIN structure_payout sp ON sp.structure_id = g.structure_id AND sp.position = r.position WHERE p.player_id = 17 GROUP BY e.event_id WINDOW w AS (ORDER BY e.date, e.event_id) ORDER BY e.date, e.event_id;
In this expression:
sum(sum(sp.payout)) OVER w, the outer
sum() is a window function, the inner
sum() is an aggregate function.
PRIMARY KEY in their respective tables.
e.event_id to the
ORDER BY of the
WINDOW clause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also included
event_id in the result to distinguish multiple events per day.
While the query restricts to a single player (
WHERE p.player_id = 17), we don't need to add
GROUP BY and
ORDER BY. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping by
p.name could not repair the query then.
I also removed
e.date from the
GROUP BY clause. The primary key
e.event_id covers all columns of the input row since PostgreSQL 9.1.
If you change the query to return multiple players at once, adapt:
... WHERE p.player_id < 17 -- example - multiple players GROUP BY p.name, p.player_id, e.date, e.event_id -- e.date and p.name redundant WINDOW w AS (ORDER BY p.name, p.player_id, e.date, e.event_id) ORDER BY p.name, p.player_id, e.date, e.event_id;
p.name is defined unique (?), group and order by
player_id additionally to get correct results in a deterministic sort order.
I only kept
GROUP BY to have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for just
e.date in the first query.)