Robert Robert - 4 months ago 14
SQL Question

How to create an efficient query which will count of the records by a specific time interval?

Which database I use?



I use a PostgreSQL 9.5.

What I need?



This's a part of my
data_store
tables:

id | starttime
-----+----------------------------
185 | 2011-09-12 15:24:03.248+02
189 | 2011-09-12 15:24:03.256+02
312 | 2011-09-12 15:24:06.112+02
313 | 2011-09-12 15:24:06.119+02
450 | 2011-09-12 15:24:09.196+02
451 | 2011-09-12 15:24:09.203+02
452 | 2011-09-12 15:24:09.21+02
... | ...


I would like to create the query which will count of the records by the specific time interval. For example, for the 4 seconds time interval - the query should return to me something like this:

starttime-from | starttime-to | count
---------------------+---------------------+---------
2011-09-12 15:24:03 | 2011-09-12 15:24:07 | 4
2011-09-12 15:24:07 | 2011-09-12 15:24:11 | 3
2011-09-12 15:24:11 | 2011-09-12 15:24:15 | 0
... | ... | ...


The most important things:


  1. The time interval depends on the user's choice. It could be
    1 second
    ,
    37 seconds
    ,
    50 minutes
    or some mix:
    2 month and 30 mintues
    . The available units for the time interval:
    millisecond
    ,
    second
    ,
    minute
    ,
    hour
    ,
    day
    ,
    month
    ,
    year
    . How you see, I need some generic/universal query for that BUT I could also create several query for each unit - it isn't a problem.

  2. The query should be efficient, because I work in a large database (20 million rows and more but in query I use only a part of this database, for example: 1 million).



The question is: How should the query look like to achieve that?

I tried to convert the solutions which I found in the following threads, but I didn't succeed:



What I have?



I deleted this section of my post for greater transparency of the post. This section wasn't necessary to give an answer my question. If you want to see what here was, look at the history of the post.

Answer

Your query seems complicated. You only need to generate the sequence of times and then use left join to bring them together . . . and aggregate:

select g.ts,  g.ts + interval '4 second', count(ds.id)
from (select generate_series(min(starttime), max(strttime), interval '4 second') as ts
      from data_store
     ) g left join
     data_store ds
     on ds.starttime >= g.ts and ds.starttime < g.ts + interval '4 second'
group by g.ts
order by g.ts;

Note: If you want the interval to begin on an exact second (and not have some strange number of milliseconds 999 times out of 1000), then use date_trunc().

EDIT:

It might be worth seeing if a correlated subquery is faster:

select gs.ts,
       (select count(*)
        from data_store ds
        where ds.starttime >= g.ts and ds.starttime < g.ts + interval '4 second'
       ) as cnt
from (select generate_series(min(starttime), max(strttime), interval '4 second') as ts
      from data_store
     ) g;