The Traveling Coder The Traveling Coder - 1 year ago 131
SQL Question

BigQuery: how to group and count rows within rolling timestamp window?

I have some experience with MongoDB and I'm learning about BigQuery. I'm trying to perform the following task, and I don't know how to do it using BigQuery's standard SQL.

I have a table with the following data. It contains events that occur on different website urls. Timestamp represents when the given event occurred. For example, the first row means, "event 'xx' occurred on url 'a.html' at 2016-10-18 15:55:16 UTC."

event_id | url | timestamp
xx a.html 2016-10-18 15:55:16 UTC
xx a.html 2016-10-19 16:68:55 UTC
xx a.html 2016-10-25 20:55:57 UTC
yy b.html 2016-10-18 15:58:09 UTC
yy a.html 2016-10-18 08:32:43 UTC
zz a.html 2016-10-20 04:44:22 UTC
zz c.html 2016-10-21 02:12:34 UTC

I want to count the number of each event that occurred on each url over a over a rolling 3 day window. In other words, I want to be able to say the following:

  • "on the url 'a.html', during the interval [2016-10-18 00:00:00 UTC, 2016-10-21 00:00:00 UTC), event 'xx' occurred twice."

  • "on the url 'a.html', during the interval [2016-10-19 00:00:00 UTC, 2016-10-22 00:00:00 UTC), event 'xx' occurred once."

  • "on the url 'a.html', during the interval [2016-10-20 00:00:00 UTC, 2016-10-23 00:00:00 UTC), event 'xx' occurred zero times." (NOTE: THIS DOES NOT NEED TO BE RETURNED AS A ROW. The absence of this row can imply that the event occurred zero times.)

Some notes: my database contains over 100k rows per day, and the occurrence of events varies. Meaning, in 1 day, event 'xx' will occur ~10,000 times and event 'zz' will occur ~0-2 times.

Given my limited SQL knowledge, I didn't want to provide structure for the resulting table, because I figured that might incorrectly limit possible answers. Thanks!

Answer Source

Below is for BigQuery Standard SQL (see Enabling Standard SQL

I am using ts as a field name (instead timestamp as it is in your example) and assume this field is of TIMESTAMP data type

WITH dailyAggregations AS (
    DATE(ts) AS day, 
    COUNT(1) AS events 
  FROM yourTable
  GROUP BY day, url, event_id, sec
  url, event_id, day, events, 
    OVER(PARTITION BY url, event_id ORDER BY sec 
  ) AS rolling3daysEvents
FROM dailyAggregations
-- ORDER BY url, event_id, day

The value of 259200 is actually 3x24x3600 so sets 3 days range, so you can set whatever actual rolling period you need

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download