Sharath Sharath - 3 months ago 27
SQL Question

Calculate the 95th percentile difference between the actual and predicted columns in SQL

I have a PostgreSQL database like this

The tables and the respective columns with data type are

Readings

meas_id - integer(Foreign keyed to Measurement.meas_id)
actual_meas - integer
predicted_meas - integer
pdatetime - Timestamp with timezone (UTC)
status - Enum('completed', 'inprogress', 'nottaken')


Measurement

meas_id - integer
meas_name - string

Meas_name has measurements length, breadth, width, height


For each of the measurements 'length' and 'breadth', I am trying to calculate the 95th percentile difference between actual and predicted values for all completed measurements within the last 30 days.

I am trying to do it this way but not getting it

SELECT
Measurement.meas_name,
MIN(Readings.actual_meas - Readings.predicted_meas) AS Difference
FROM
(
SELECT TOP 95 PERCENT
FROM Readings
ORDER BY Difference DESC
) AS NinetyFivePerc
JOIN Measurement
WHERE NinetyFivePerc.meas_id = Measurement.meas_id
AND NinetyFivePerc.pdatetime >= DATEADD(DAY, -30, GETDATE())
AND Measurement.meas_name IN ('length','breadth')
AND NinetyFivePerc.status = 'completed'


I am learning SQL and so kindly provide inputs on an optimized way of achieving it.

Answer

Postgres has the percentile_disc() and percentile_cont() aggregation functions.

So, you can just do:

SELECT m.meas_name, 
       PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY r.actual_meas - r.predicted_meas),
       PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY r.actual_meas - r.predicted_meas)
FROM Readings r JOIN
     measurements m
     ON r.meas_id = m.meas_id
WHERE m.meas_name IN ('length', 'breadth') AND
      r.status = 'completed'
GROUP BY m.meas_name;