code123 code123 - 3 months ago 11
R Question

Split date into YYYY-MM-DD-HH-MM-SS and aggregate date (R)

How can one split the following

datetime
into
year-month-day-hour-minute-second
? The date was created using:

datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'),
as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'),
by="hour",tz="GMT"))


The ultimate goal is to aggregate
x
which is at
hourly
resolution into
6-hourly
resolution. Probably it is possible to
aggregate datetime
without needing to split it?

datetime x
1 2015-04-01 00:00:00 0.0
2 2015-04-01 01:00:00 0.0
3 2015-04-01 02:00:00 0.0
4 2015-04-01 03:00:00 0.0
5 2015-04-01 04:00:00 0.0
6 2015-04-01 05:00:00 0.0
7 2015-04-01 06:00:00 0.0
8 2015-04-01 07:00:00 0.0
9 2015-04-01 08:00:00 0.0
10 2015-04-01 09:00:00 0.0
11 2015-04-01 10:00:00 0.0
12 2015-04-01 11:00:00 0.0
13 2015-04-01 12:00:00 0.0
14 2015-04-01 13:00:00 0.0
15 2015-04-01 14:00:00 0.0
16 2015-04-01 15:00:00 0.0
17 2015-04-01 16:00:00 0.0
18 2015-04-01 17:00:00 0.0
19 2015-04-01 18:00:00 0.0
20 2015-04-01 19:00:00 0.0
21 2015-04-01 20:00:00 0.0
22 2015-04-01 21:00:00 0.0
23 2015-04-01 22:00:00 1.6
24 2015-04-01 23:00:00 0.2
25 2015-04-02 00:00:00 1.5
26 2015-04-02 01:00:00 1.5
27 2015-04-02 02:00:00 0.5
28 2015-04-02 03:00:00 0.0
29 2015-04-02 04:00:00 0.0
30 2015-04-02 05:00:00 0.0
31 2015-04-02 06:00:00 0.0
32 2015-04-02 07:00:00 0.5
33 2015-04-02 08:00:00 0.3
34 2015-04-02 09:00:00 0.0
35 2015-04-02 10:00:00 0.0
36 2015-04-02 11:00:00 0.0
37 2015-04-02 12:00:00 0.0
38 2015-04-02 13:00:00 0.0
39 2015-04-02 14:00:00 0.0
40 2015-04-02 15:00:00 0.0
41 2015-04-02 16:00:00 0.0
42 2015-04-02 17:00:00 0.0
43 2015-04-02 18:00:00 0.0
44 2015-04-02 19:00:00 0.0
45 2015-04-02 20:00:00 0.0
46 2015-04-02 21:00:00 0.0
47 2015-04-02 22:00:00 0.0
48 2015-04-02 23:00:00 0.0
....


The output should be very close to:

YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss
2015-04-01 00:00:00 2015-04-01 06:00:00 2015-04-01 12:00:00 2015-04-01 18:00:00
2015-04-02 00:00:00 2015-04-02 06:00:00 2015-04-02 12:00:00 2015-04-02 18:00:00
.....


I appreciate your thoughts on this.

EDIT

How to implement @r2evans answer on a list object such as:

x = runif(5856)
flst1=list(x,x,x,x)

flst1=lapply(flst1, function(x){x$datetime <- as.POSIXct(x$datetime, tz = "GMT"); x})

sixhours1=lapply(flst1, function(x) {x$bin <- cut(x$datetime,sixhours);x})

head(sixhours1[[1]],n=7)

ret=lapply(sixhours1, function(x) aggregate(x$precip, list(x$bin), sum,na.rm=T))

head(ret[[1]],n=20)

Answer

Your minimal data is incomplete, so I'll generate something random:

dat <- data.frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = "GMT"),
                                        as.POSIXct("2015-11-30 23:59:59", tz = "GMT"), 
                                        by = "hour",tz = "GMT"),
                  x = runif(5856))
# the "1+" ensures we extend at least to the end of the datetimes;
# without it, the last several rows in "bin" would be NA
sixhours <- seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = "GMT"),
                       1 + as.POSIXct("2015-11-30 23:59:59", tz = "GMT"), 
                       by = "6 hours",tz = "GMT")

# this doesn't have to go into the data.frame (could be a separate
# vector), but I'm including it for easy row-wise comparison
dat$bin <- cut(dat$datetime, sixhours)

head(dat, n=7)
#              datetime          x                 bin
# 1 2015-04-01 00:00:00 0.91022534 2015-04-01 00:00:00
# 2 2015-04-01 01:00:00 0.02638850 2015-04-01 00:00:00
# 3 2015-04-01 02:00:00 0.42486354 2015-04-01 00:00:00
# 4 2015-04-01 03:00:00 0.90722845 2015-04-01 00:00:00
# 5 2015-04-01 04:00:00 0.24540085 2015-04-01 00:00:00
# 6 2015-04-01 05:00:00 0.60360906 2015-04-01 00:00:00
# 7 2015-04-01 06:00:00 0.01843313 2015-04-01 06:00:00
tail(dat)
#                 datetime         x                 bin
# 5851 2015-11-30 18:00:00 0.5963204 2015-11-30 18:00:00
# 5852 2015-11-30 19:00:00 0.2503440 2015-11-30 18:00:00
# 5853 2015-11-30 20:00:00 0.9600476 2015-11-30 18:00:00
# 5854 2015-11-30 21:00:00 0.6837394 2015-11-30 18:00:00
# 5855 2015-11-30 22:00:00 0.9093506 2015-11-30 18:00:00
# 5856 2015-11-30 23:00:00 0.9197769 2015-11-30 18:00:00
nrow(dat)
# [1] 5856

The work:

ret <- aggregate(dat$x, list(dat$bin), mean)
nrow(ret)
# [1] 976
head(ret)
#               Group.1         x
# 1 2015-04-01 00:00:00 0.5196193
# 2 2015-04-01 06:00:00 0.4770019
# 3 2015-04-01 12:00:00 0.5359483
# 4 2015-04-01 18:00:00 0.8140603
# 5 2015-04-02 00:00:00 0.4874332
# 6 2015-04-02 06:00:00 0.6139554
tail(ret)
#                 Group.1         x
# 971 2015-11-29 12:00:00 0.6881228
# 972 2015-11-29 18:00:00 0.4791925
# 973 2015-11-30 00:00:00 0.5793872
# 974 2015-11-30 06:00:00 0.4809868
# 975 2015-11-30 12:00:00 0.5157432
# 976 2015-11-30 18:00:00 0.7199298
Comments