Anton Tarasenko Anton Tarasenko - 7 months ago 15
SQL Question

Getting the percentile the item belongs to in SQL

I'm trying to put together some simple stats, but stuck with computing the percentile that the author belongs to according to his total scores:

select
[by] author,
count(*) count,
sum(score) sum_score,
quantiles(sum(score), 101) percentile_sum_score,
from
[bigquery-public-data:hacker_news.stories]
group by
author


This code returns each author's
sum(score)
in the column
percentile_sum_score
(defined as
quantiles(sum(score), 101)
). But not the percentile the author is in, relative to other authors.

This happens in BigQuery, and some conventional functions are not available (docs).

Is there a way to get the right statistics?

Answer

You might be looking for PERCENT_RANK() function
Below is possible use for your example

SELECT
    author,
    [count],
    sum_score,
    PERCENT_RANK() OVER(ORDER BY sum_score DESC) percentile_sum_score,
FROM (
  SELECT
    [by] author,
    COUNT(1) [count],
    SUM(score) sum_score,
  FROM [bigquery-public-data:hacker_news.stories]
  GROUP BY author
)
Comments