tonycpsu - 2 months ago 10
SQL Question

# PostgreSQL query for calculating rotisserie baseball points

I am trying to write a PostgreSQL query to calculate fantasy baseball points, as described here. What I have so far is available in this SQLfiddle, which calculates the points for each statistic correctly, except in the case of ties. Points for teams that are tied are supposed to be calculated like so:

In the case of a tie, each team involved receives an average of the total points due — i.e., in the above example, if two teams were tied for first in a category, each would receive 9.5 points [(10 + 9) / 2 = 9.5].

You can see the errors in my approach in the first and second result sets in my SQLfiddle. In the first result set, the teams tied with 9 home runs should each get 3.5 points (sum of ranks 4 and 3 == 7, divided by 2) while in the second set, the teams tied with 33 RBI should also each get 3.5 (sum of ranks 5, 4, 3, and 2 == 14, divided by 4).

What would be the simplest way to correct these errors, and distribute the total points evenly among teams tied in the rankings of each statistic?

In my view, provided example is not a complete one, it'd be nice to have a candidate key among the provided fields.

1. It is possible to use any aggregate function as window one. This removes the necessity for the sub-query to count all rows in the table.

Consider the output of the following query:

``````SELECT
hr,rbi,
rank() OVER h AS hr_rank,
row_number() OVER h AS hr_rn,
count(*) OVER () - rank() OVER h + 1 AS hr_aprx,
rank() OVER r AS rbi_rank,
row_number() OVER r AS rbi_rn,
count(*) OVER () - rank() OVER r + 1 AS rbi_aprx,
count(*) OVER () AS cnt
FROM
stats
WINDOW h AS (ORDER BY hr DESC), r AS (ORDER BY rbi DESC);
``````

This query provides the same information as your first 2 queries. If you'll look into the `EXPLAIN (analyze, buffers)` output for it, you'll see that table is being accessed only once.

I've named point columns as `%_aprx` here, for these are approximate points yet, we'll have to calculate the average.

1. Now, as we have prepared some data for further calculations, we'll have to use a sub-query. This is due to the fact, that we must use our `%_aprx` columns for data grouping. I will use CTE here, as I find named sub-queries looking better.

Consider this query (also on sql-fiddle):

``````WITH ranks AS (
SELECT
hr, rbi,
rank() OVER h AS hr_rank,
row_number() OVER h AS hr_rn,
count(*) OVER () - rank() OVER h + 1 AS hr_aprx,
rank() OVER r AS rbi_rank,
row_number() OVER r AS rbi_rn,
count(*) OVER () - rank() OVER r + 1 AS rbi_aprx,
count(*) OVER () AS cnt
FROM
stats
WINDOW h AS (ORDER BY hr DESC), r AS (ORDER BY rbi DESC)
)
SELECT
hr, rbi,
(avg(hr_rn) OVER h)::float AS hr_pts,
(avg(rbi_rn) OVER r)::float AS rbi_pts,
(avg(hr_rn) OVER h + avg(rbi_rn) OVER r)::float AS ttl_pts
FROM
ranks
WINDOW h AS (PARTITION BY hr_aprx), r AS (PARTITION BY rbi_aprx)
ORDER BY
ttl_pts DESC, hr_pts DESC;
``````

I'm converting resulting type of `avg()` calls to `float` to get rid of the series of zeroes. You can choose to use `round()` function instead here though.

I've also added 2 ordering conditions, for ordering just by `ttl_pts` is not enough.

Note, that in the outer query's window definitions `ORDER BY` is missed out on purpose. With it, you'll get a running average effect (you can change query and see yourself).