I need to run some statistical analysis on intervals i.e. difference between two datetime fields in a table.
According to the aggregate function documentation here. The aggregate
... (These are separated out merely to avoid cluttering the listing of
As I said in a comment, to work out sample standard deviation manually, at some point you multiply an interval by an interval. PostgreSQL doesn't support that.
To work around that issue, reduce the interval to hours or minutes or seconds (or whatever). This turns out to be a lot simpler than working out the calculation manually, and it suggests why PostgreSQL doesn't support this kind of calculation out of the box.
First, a function from the PostgreSQL general mailing list
CREATE OR REPLACE FUNCTION interval_to_seconds(interval) RETURNS double precision AS $$ SELECT (extract(days from $1) * 86400) + (extract(hours from $1) * 3600) + (extract(minutes from $1) * 60) + extract(seconds from $1); $$ LANGUAGE SQL;
Now we can take the standard deviation of a simple set of intervals.
with intervals (i) as ( values (interval '1 hour'), (interval '2 hour'), (interval '3 hour'), (interval '4 hour'), (interval '5 hour') ) , intervals_as_seconds as ( select interval_to_seconds(i) as seconds from intervals ) select stddev(seconds), stddev(seconds)/60 from intervals_as_seconds
in_sec in_min double precision double precision -- 5692.09978830308 94.8683298050514
You can verify the results however you like.
Now let's say you wanted hour granularity instead of seconds. Clearly, the choice of granularity is highly application dependent. You might define another function,
interval_to_hours(interval). You can use a very similar query to calculate the standard deviation.
with intervals (i) as ( values (interval '1 hour'), (interval '2 hour'), (interval '3 hour'), (interval '4 hour'), (interval '5 hour') ) , intervals_as_hours as ( select interval_to_hours(i) as hours from intervals ) select stddev(hours) as stddev_in_hrs from intervals_as_hours
stddev_in_hrs double precision -- 1.58113883008419
The value for standard deviation in hours is clearly different from the value in minutes or in seconds. But they measure exactly the same thing. The point is that the "right" answer depends on the granularity (units) you want to use, and there are a lot of choices. (From microseconds to centuries, I imagine.)
Also, consider this statement.
select interval_to_hours(interval '45 minutes')
interval_to_hours double precision -- 0
Is that the right answer? You can't say; the right answer is application-dependent. I can imagine applications that would want 45 minutes to be considered as 1 hour. I can also imagine applications that would want 45 minutes to be considered as 1 hour for some calculations, and as 0 hours for other calculations.
And think about this question. How many seconds are in a month?
And I think that's why PostgreSQL doesn't support this kind of calculation out of the box. The right way to do it with interval arguments is too application-dependent.
Later . . .
I found this discussion on one of the PostgreSQL mailing lists.