Bravetheif Bravetheif - 2 months ago 18
MySQL Question

Downsampling analytics data in MySQL or in R

I am storing analytics data in an MySQL database as a table with a timestamp and some data, and want to downsample (i.e group it within a time range) this data (by counting the number of entries) for displaying on an admin console, and I was wondering if it would be more efficient to select the data and downsample it with an R script, or if it would be better to use

GROUP BY UNIX_TIMESTAMP(timestamp) DIV <some time>


and do it on the database layer. Any other tips would also be appreciated.

Answer

If you can use dplyr, you could do it with something like the following:

library(dplyr)

yay <- 
  # Specify username and password in my.cnf
  src_mysql(host = "blah.com") %>%
  tbl("some_table") %>%
  # You will need to compute a grouping variable
  mutate(group = unix_timestamp(timestamp)) %>%
  group_by(group) %>%
  # This will return the number of rows in each group
  summarise(n = n()) %>%
  # This will execute the query and return a data.frame
  collect  
Comments