Petr Shypila Petr Shypila - 1 year ago 104
SQL Question

SQL. Unexpected result of calling PERCENTILE_CONT function

Actually I have understanding of how PERCENTILE_CONT works, but I get different result by calculating percentiles by hand and by calling a function.
Here is the dataset:


To calculate 25% I sum up
and then divide it by 2. And the result is

To calculate 50% I sum up
and then divide it by 2. And the result is

To calculate 75% I sum up
and then divide it by 2. And the result is

However, when I run this sql:

percentile_cont(0.25) within group(order by YEAR_2_FTE ASC),
percentile_cont(0.5) within group(order by YEAR_2_FTE ASC),
percentile_cont(0.75) within group(order by YEAR_2_FTE ASC) from sr_database
where firm_id=999;

I get such result:

25%: 367.529664875

50%: 397.53491395

75%: 425.076839075

What am I missing?

Answer Source

You're correctly taking the point halfway between the 4th and 5th value to calculate the 50th percentile. But taking the halfway point between the 2nd and 3rd for the 25th percentile (and similarly for the 75th) isn't correct. This is best seen with an illustration:

enter image description here

This shows 8 points and 4 lines drawn between them to make 4 equally sized segements. The 25th and 75th percentile markers fall closer to the 3rd and 6th values respectively.

The Oracle documentation describes how these are calculated:

The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).

In your example, for the 25th percentile P = 0.25 and N = 8 so:

RN = (1+ 0.25*(8-1))
   = 2.75

So CRN = 3 and FRN = 2

The documentation goes on to say:

The final result will be:

If (CRN = FRN = RN) then the result is

(value of expression from row at RN)

Otherwise the result is

(CRN - RN) * (value of expression for row at FRN) +

(RN - FRN) * (value of expression for row at CRN)

Plugging in the values above, the "Otherwise..." part applies for the calculation i.e:

Result = (3 - 2.75) * 350 + (2.75 - 2) * 373.3728856
       = 367.5296642

And similarly for the 75th percentile:

 RN = (1+ 0.75*(8-1))
    = 6.25
CRN = 7
FRN = 6
Result = (7 - 6.25) * 414.9842786 + (6.25 - 6) * 455.3545205
       = 425.076839075
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download