8bytez - 2 months ago 6x
R Question

# Creating groups based on UTC Time

I have a dataset which looks like this:

``````str(m12)'data.frame':   48178 obs. of  10 variables:
\$ created_utc  : POSIXct, format: "2016-04-19 02:59:02" "2016-05-01 01:51:58" "2016-04-20 15:11:24" "2016-04-26 23:09:13" ...
\$ WC           : int  122 24 27 34 43 30 18 49 52 16 ...
\$ Analytic     : num  74.05 6.55 1.32 26.21 11.64 ...
\$ Clout        : num  20.6 1 35.5 38.4 40.8 ...
\$ Authentic    : num  80.8 91.3 92.5 14.7 87.5 ...
....
``````

I want to calculate the average score for every variable for every single day.

I tried this:

``````mean <- aggregate(m12[, 2:10], list(m12\$created_utc), mean)
``````

It calculates the mean for every second, but I need it for every day. Do you know of a way to achieve that?

sorry for not providing sample data. I simply do not know how to create a POSIXct variable.

We need to convert the 'created_utc' to `Date` class so the time part will be stripped off. Then, use it as the grouping variable, to get the `mean` of each column for a single day.

``````aggregate(.~cbind( created_utc= as.Date(created_utc)), m12, FUN = mean,
na.rm = TRUE, na.action = NULL)
``````

Faster approaches are using `dplyr` or `data.table`

``````library(dplyr)
m12 %>%
group_by(created_utc = as.Date(created_utc)) %>%
summarise_each(funs(mean= mean(., na.rm = TRUE)))
``````

Or

``````setDT(m12)[, lapply(.SD, mean, na.rm = TRUE) , .(created_utc = as.Date(created_utc))]
``````
Source (Stackoverflow)