Sharath Sharath - 4 months ago 33
SQL Question

Calculate the 95th percentile difference between the actual and predicted columns in SQL

I have a PostgreSQL database like this

The tables and the respective columns with data type are


meas_id - integer(Foreign keyed to Measurement.meas_id)
actual_meas - integer
predicted_meas - integer
pdatetime - Timestamp with timezone (UTC)
status - Enum('completed', 'inprogress', 'nottaken')


meas_id - integer
meas_name - string

Meas_name has measurements length, breadth, width, height

For each of the measurements 'length' and 'breadth', I am trying to calculate the 95th percentile difference between actual and predicted values for all completed measurements within the last 30 days.

I am trying to do it this way but not getting it

MIN(Readings.actual_meas - Readings.predicted_meas) AS Difference
FROM Readings
ORDER BY Difference DESC
) AS NinetyFivePerc
JOIN Measurement
WHERE NinetyFivePerc.meas_id = Measurement.meas_id
AND NinetyFivePerc.pdatetime >= DATEADD(DAY, -30, GETDATE())
AND Measurement.meas_name IN ('length','breadth')
AND NinetyFivePerc.status = 'completed'

I am learning SQL and so kindly provide inputs on an optimized way of achieving it.


Postgres has the percentile_disc() and percentile_cont() aggregation functions.

So, you can just do:

SELECT m.meas_name, 
       PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY r.actual_meas - r.predicted_meas),
       PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY r.actual_meas - r.predicted_meas)
FROM Readings r JOIN
     measurements m
     ON r.meas_id = m.meas_id
WHERE m.meas_name IN ('length', 'breadth') AND
      r.status = 'completed'
GROUP BY m.meas_name;