Mathieu Mathieu - 24 days ago 6
R Question

Aggregate function in R zoo returns error

I have been using this line of code daily for a very long time and something broke after returning to standard time. I am trying to use

aggregate.zoo
to rank hourly data by day. The period of data i am looking at does not include DST time, so I don't understand what the issue is. I am using the
zoo
package. Here is the structure of the data:

require(zoo)

structure(c(15.52, 14.56, 14.31, 14.17, 13.75, 15.3, 25.57, 25.39,
23.43, 22.92, 23.31, 23.44, 22.09, 21.28, 21, 20.94, 27.16, 32.73,
33.74, 29.12, 24.78, 21.44, 18.95, 17.08, 17.9, 17.54, 16.45,
16.59, 16.09, 17.23, 25.31, 25.43, 24.93, 24.47, 23.69, 21.53,
19.61, 19.53, 19.38, 19.38, 24.59, 29.03, 30.02, 23.78, 20.44,
21.39, 18.79, 21.7, 20.5, 20.63, 18.57, 19.41, 19.2, 15.23, 23.48,
24.89, 24.79, 24.01, 23.18, 22.5, 20.88, 21.12, 20, 20.55, 27.83,
31.21, 28.29, 26.1, 23.31, 21.64, 18.19, 17.28, 17.87, 17.63,
16.48, 17.27, 17.04, 17.79, 19.78, 20.92, 23.53, 25.45, 24.37,
21.89, 21.22, 21.03, 19.92, 20.14, 24.9, 27.88, 28.54, 29.14,
24.93, 19.24, 21.75, 18.66, 19.21, 18.49, 17.08, 15.92, 15.51,
15.5, 16.1, 16.61, 18.54, 20.15, 20.36, 20.32, 19.94, 19.13,
18.58, 18.71, 21.1, 26.77, 28.98, 27.12, 24.71, 20.83, 18.2,
16.68, 17.56, 16.86, 15.79, 15.47, 15.49, 17.66, 24.05, 24.97,
25.1, 25.74, 25.32, 24.98, 24.13, 24.05, 23.45, 23.84, 26.16,
31.2, 32.57, 30.19, 26.1, 22, 18.9, 17.85, 17.7, 16.53, 15.32,
14.77, 15.06, 17.54, 24.94, 26.06, 25.52, 25.85, 25.53, 24.97,
24.17, 24.19, 23.55, 23.78, 26.35, 31.49, 33.06, 29.9, 25.52,
21.62, 18.77, 17.77, 17.87, 16.75, 15.38, 14.69, 15, 17.84, 25.05,
25.5, 24.92, 25.46, 25.03, 24.81, 24.22, 23.54, 22.95, 22.9,
25.82, 30.87, 32.49, 29.17, 25.13, 21.56, 18.58, 17.67, 17.06,
15.96, 14.45, 13.93, 14.27, 16.64, 23.29, 24.17, 24.07, 24.22,
24.06, 24.06, 23.24, 23.05, 22.39, 22.54, 25.07, 29.89, 31.53,
28.24, 24.28, 20.78, 18.09, 17.02), index = structure(c(1478671200,
1478674800, 1478678400, 1478682000, 1478685600, 1478689200, 1478692800,
1478696400, 1478700000, 1478703600, 1478707200, 1478710800, 1478714400,
1478718000, 1478721600, 1478725200, 1478728800, 1478732400, 1478736000,
1478739600, 1478743200, 1478746800, 1478750400, 1478754000, 1478757600,
1478761200, 1478764800, 1478768400, 1478772000, 1478775600, 1478779200,
1478782800, 1478786400, 1478790000, 1478793600, 1478797200, 1478800800,
1478804400, 1478808000, 1478811600, 1478815200, 1478818800, 1478822400,
1478826000, 1478829600, 1478833200, 1478836800, 1478840400, 1478844000,
1478847600, 1478851200, 1478854800, 1478858400, 1478862000, 1478865600,
1478869200, 1478872800, 1478876400, 1478880000, 1478883600, 1478887200,
1478890800, 1478894400, 1478898000, 1478901600, 1478905200, 1478908800,
1478912400, 1478916000, 1478919600, 1478923200, 1478926800, 1478930400,
1478934000, 1478937600, 1478941200, 1478944800, 1478948400, 1478952000,
1478955600, 1478959200, 1478962800, 1478966400, 1478970000, 1478973600,
1478977200, 1478980800, 1478984400, 1478988000, 1478991600, 1478995200,
1478998800, 1479002400, 1479006000, 1479009600, 1479013200, 1479016800,
1479020400, 1479024000, 1479027600, 1479031200, 1479034800, 1479038400,
1479042000, 1479045600, 1479049200, 1479052800, 1479056400, 1479060000,
1479063600, 1479067200, 1479070800, 1479074400, 1479078000, 1479081600,
1479085200, 1479088800, 1479092400, 1479096000, 1479099600, 1479103200,
1479106800, 1479110400, 1479114000, 1479117600, 1479121200, 1479124800,
1479128400, 1479132000, 1479135600, 1479139200, 1479142800, 1479146400,
1479150000, 1479153600, 1479157200, 1479160800, 1479164400, 1479168000,
1479171600, 1479175200, 1479178800, 1479182400, 1479186000, 1479189600,
1479193200, 1479196800, 1479200400, 1479204000, 1479207600, 1479211200,
1479214800, 1479218400, 1479222000, 1479225600, 1479229200, 1479232800,
1479236400, 1479240000, 1479243600, 1479247200, 1479250800, 1479254400,
1479258000, 1479261600, 1479265200, 1479268800, 1479272400, 1479276000,
1479279600, 1479283200, 1479286800, 1479290400, 1479294000, 1479297600,
1479301200, 1479304800, 1479308400, 1479312000, 1479315600, 1479319200,
1479322800, 1479326400, 1479330000, 1479333600, 1479337200, 1479340800,
1479344400, 1479348000, 1479351600, 1479355200, 1479358800, 1479362400,
1479366000, 1479369600, 1479373200, 1479376800, 1479380400, 1479384000,
1479387600, 1479391200, 1479394800, 1479398400, 1479402000, 1479405600,
1479409200, 1479412800, 1479416400, 1479420000, 1479423600, 1479427200,
1479430800, 1479434400, 1479438000, 1479441600, 1479445200), class = c("POSIXct",
"POSIXt"), tzone = "America/New_York"), class = "zoo")


All I'm trying to do is rank the data by day, so the line I have been using successfully for a long time is:

ordered.price = aggregate(z, as.Date(index(z), tz='EST'),
FUN=rank, ties.method = "first", na.last = FALSE)


When running this, I get this error:


Error in zoo(df, ix[!is.na(ix)]) :
“x” : attempt to define invalid zoo object



I am at lost here, especially since the code was functioning properly for so long. I'm using R 3.2.3. Any help is really appreciated.

Answer

Below we give some approaches that avoid the error. (1) is not likely what you want but it will give us some insight before we get to (2) and (3).

1) padding There are different numbers of elements returned by FUN for different dates so it can't create a rectangular result.

lens <- aggregate(z,as.Date(index(z),tz='EST'), FUN = length)

giving:

> lens
2016-11-09 2016-11-10 2016-11-11 2016-11-12 2016-11-13 2016-11-14 2016-11-15 2016-11-16 2016-11-17 2016-11-18 
        23         24         24         24         24         24         24         24         24          1 

Padding to a length of 24 will allow it to not give an error:

aggregate(z,as.Date(index(z),tz='EST'), FUN = 
  function(x) replace(rep(NA, max(lens)), seq_along(x), rank(x, ties="first", na.last=F)))

giving:

2016-11-09  6  4  3  2  1  5 19 18 15 13 14 16 12 10  9  8 20 22 23 21 17 11  7 NA
2016-11-10  4  7  6  2  3  1  5 21 22 20 18 16 15 12 11  9 10 19 23 24 17 13 14  8
2016-11-11 13  7  9  3  5  4  1 17 20 19 18 15 14 10 11  6  8 22 24 23 21 16 12  2
2016-11-12  4  7  5  1  3  2  6  9 12 17 21 18 16 14 13 10 11 19 22 23 24 20  8 15
2016-11-13 11 14  8  6  3  2  1  4  5  9 16 18 17 15 13 10 12 20 22 24 23 21 19  7
2016-11-14  4  6  5  3  1  2  7 12 15 17 19 18 16 14 13 10 11 21 23 24 22 20  9  8
2016-11-15  7  6  4  3  1  2  5 14 20 16 19 18 15 12 13 10 11 21 23 24 22 17  9  8
2016-11-16  5  7  4  3  1  2  6 17 20 15 19 16 14 13 12 11 10 21 23 24 22 18  9  8
2016-11-17  7  6  4  3  1  2  5 14 18 17 19 15 16 13 12 10 11 21 23 24 22 20  9  8
2016-11-18  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

2) time zone Although padding avoids the error it is not likely what you really want. It is likely that the alignment with dates is off which is why it is producing 23 elements for the first date and 1 element for the last. If you fix that up so that there are 24 elements per date then it will work without padding.

Using this value for tz seems to give 24 elements per date. (Running OlsonNames() gives a list of time zones.)

table(as.Date(time(z), tz = "Etc/GMT+6"))

giving:

2016-11-09 2016-11-10 2016-11-11 2016-11-12 2016-11-13 2016-11-14 2016-11-15 2016-11-16 2016-11-17 
        24         24         24         24         24         24         24         24         24 

Thus we can replace tz="EST" used in the question with tz="Etc/GMT=6" giving this code:

aggregate(z, as.Date(index(z), tz='Etc/GMT+6'), 
     FUN=rank, ties.method = "first", na.last = FALSE)

which results in:

2016-11-09  6  4 3 2 1 5 20 19 16 14 15 17 13 11 10  9 21 23 24 22 18 12  8  7
2016-11-10  6  5 2 3 1 4 21 22 20 18 16 14 11 10  8  9 19 23 24 17 12 13  7 15
2016-11-11  8 10 4 6 5 1 17 20 19 18 15 14 11 12  7  9 22 24 23 21 16 13  3  2
2016-11-12  6  4 1 3 2 5  9 12 17 21 18 16 14 13 10 11 19 22 23 24 20  8 15  7
2016-11-13 14  9 7 3 2 1  4  5 10 16 18 17 15 13 11 12 20 22 24 23 21 19  8  6
2016-11-14  5  4 3 1 2 6 12 15 17 19 18 16 14 13 10 11 21 23 24 22 20  9  8  7
2016-11-15  6  4 3 1 2 5 14 20 16 19 18 15 12 13 10 11 21 23 24 22 17  9  8  7
2016-11-16  7  4 3 1 2 6 17 20 15 19 16 14 13 12 11 10 21 23 24 22 18  9  8  5
2016-11-17  7  4 3 1 2 5 14 18 17 19 15 16 13 12 10 11 21 23 24 22 20  9  8  6

3) Noon Another way to approach this if we know that (i) we are only concerned with date/times in the New York and GMT time zones and (ii) the data begins at 1am and has consecutive hours is to use the the datetimes at position 12, 12+24, 12+48, etc. since noon will be the same day in either of the two aforementioned time zones.

aggregate(z, as.Date(index(z)[rep(seq(12, length(z), 24), each = 24)]),
     FUN=rank, ties.method = "first", na.last = FALSE)

4) Subtract 1 second The poster commented that the problem is that his hours represent the end of a one hour event so the hours for a day are 01, 02, ..., 24 but 24 would be regarded by R as the beginning of the next day. We can handle tht by subtracting one second. That keeps the last hour on the same day as the other hours.

aggregate(z, as.Date(index(z)-1, tz = "EST5EDT"),
     FUN=rank, ties.method = "first", na.last = FALSE)