tonycpsu tonycpsu - 2 months ago 10
SQL Question

PostgreSQL query for calculating rotisserie baseball points

I am trying to write a PostgreSQL query to calculate fantasy baseball points, as described here. What I have so far is available in this SQLfiddle, which calculates the points for each statistic correctly, except in the case of ties. Points for teams that are tied are supposed to be calculated like so:


In the case of a tie, each team involved receives an average of the total points due — i.e., in the above example, if two teams were tied for first in a category, each would receive 9.5 points [(10 + 9) / 2 = 9.5].


You can see the errors in my approach in the first and second result sets in my SQLfiddle. In the first result set, the teams tied with 9 home runs should each get 3.5 points (sum of ranks 4 and 3 == 7, divided by 2) while in the second set, the teams tied with 33 RBI should also each get 3.5 (sum of ranks 5, 4, 3, and 2 == 14, divided by 4).

What would be the simplest way to correct these errors, and distribute the total points evenly among teams tied in the rankings of each statistic?

Answer

In my view, provided example is not a complete one, it'd be nice to have a candidate key among the provided fields.

  1. It is possible to use any aggregate function as window one. This removes the necessity for the sub-query to count all rows in the table.

Consider the output of the following query:

SELECT 
    hr,rbi,
    rank() OVER h AS hr_rank,
    row_number() OVER h AS hr_rn,
    count(*) OVER () - rank() OVER h + 1 AS hr_aprx,
    rank() OVER r AS rbi_rank,
    row_number() OVER r AS rbi_rn,
    count(*) OVER () - rank() OVER r + 1 AS rbi_aprx,
    count(*) OVER () AS cnt
FROM 
    stats
WINDOW h AS (ORDER BY hr DESC), r AS (ORDER BY rbi DESC);

This query provides the same information as your first 2 queries. If you'll look into the EXPLAIN (analyze, buffers) output for it, you'll see that table is being accessed only once.

I've named point columns as %_aprx here, for these are approximate points yet, we'll have to calculate the average.

  1. Now, as we have prepared some data for further calculations, we'll have to use a sub-query. This is due to the fact, that we must use our %_aprx columns for data grouping. I will use CTE here, as I find named sub-queries looking better.

Consider this query (also on sql-fiddle):

WITH ranks AS (
    SELECT 
        hr, rbi,
        rank() OVER h AS hr_rank,
        row_number() OVER h AS hr_rn,
        count(*) OVER () - rank() OVER h + 1 AS hr_aprx,
        rank() OVER r AS rbi_rank,
        row_number() OVER r AS rbi_rn,
        count(*) OVER () - rank() OVER r + 1 AS rbi_aprx,
        count(*) OVER () AS cnt
    FROM 
        stats
    WINDOW h AS (ORDER BY hr DESC), r AS (ORDER BY rbi DESC)
)
SELECT 
    hr, rbi,
    (avg(hr_rn) OVER h)::float AS hr_pts,
    (avg(rbi_rn) OVER r)::float AS rbi_pts,
    (avg(hr_rn) OVER h + avg(rbi_rn) OVER r)::float AS ttl_pts
FROM 
    ranks
WINDOW h AS (PARTITION BY hr_aprx), r AS (PARTITION BY rbi_aprx)
ORDER BY 
    ttl_pts DESC, hr_pts DESC;

I'm converting resulting type of avg() calls to float to get rid of the series of zeroes. You can choose to use round() function instead here though.

I've also added 2 ordering conditions, for ordering just by ttl_pts is not enough.

Note, that in the outer query's window definitions ORDER BY is missed out on purpose. With it, you'll get a running average effect (you can change query and see yourself).

Comments