mth_mad mth_mad - 2 months ago 10
R Question

`sapply` is successful for "zoo" object but not "xts" object, why?

Here is an example that is showing a clear difference between "zoo" and "xts".

library(xts)

mydf = as.data.frame(replicate(6, sample(c(1:10), 10, rep = T)))
myzoo = zoo(mydf, order.by = Sys.Date() + 1:10)
resultzoo = sapply(myzoo, function(x) x+1)


Although I am losing the date (which is a behaviour already commented here), the code above works fine. However, the code below gives error

myxts = xts(mydf, order.by = Sys.Date() + 1:10)
resultxts = sapply(myxts, function(x) x+1)
# Error in array(r, dim = d, dimnames = if (!(is.null(n1 <- names(x[[1L]])) & :
# length of 'dimnames' [1] not equal to array extent


I cannot find any explanation for this weird behaviour. Any idea is welcome.

Answer

I think you have raised a very good question. Before making my answer, I would like to comment that you can use

sapply(myzoo, "+", 1)
sapply(myxts, "+", 1)

instead of

sapply(myzoo, function (x) x + 1)
sapply(myxts, function (x) x + 1)

This is because "+" is already a function. Try 1 + 2 and "+"(1, 2).


sapply takes two stages. The first stage is an ordinary call to lapply; the second stage is a call to simplify2array for result simplification. The error message you get announces that something wrong happens in the second stage. Indeed, if we try:

x1 <- lapply(myzoo, "+", 1)
x2 <- lapply(myxts, "+", 1)

we get no error at all!

However, if we compare x1 and x2, we will see the difference. For neatness I will just extract the first list element:

x1[[1]]

#2016-09-30 2016-10-01 2016-10-02 2016-10-03 2016-10-04 2016-10-05 2016-10-06 
#         3          4          5          7          2          2          4 
#2016-10-07 2016-10-08 2016-10-09 
#         3          5          3 

x2[[1]]

#           V1
#2016-09-30  3
#2016-10-01  4
#2016-10-02  5
#2016-10-03  7
#2016-10-04  2
#2016-10-05  2
#2016-10-06  4
#2016-10-07  3
#2016-10-08  5
#2016-10-09  3

Ah, for "zoo" object, dimension is dropped so we get a vector; while for "xts" object, dimension is not dropped hence we get a single column matrix!

It is exactly for this reason that sapply fails. By default, the simplification option for sapply is simplify = TRUE which always tries to simplify to a 1D vector or a 2D matrix. For x1, this is no problem; but for x2, this is impossible.

If we use a milder setting: simplify = "array", we will get appropriate behaviour:

  1. sapply(myzoo, "+", 1, simplify = "array") gives a 2D array (i.e., a matrix you see);
  2. sapply(myxts, "+", 1, simplify = "array") gives a 3D array.

From this example, we can see that sapply is not always desirable. Why not use the following:

y1 <- do.call(cbind, x1)
y2 <- do.call(cbind, x2)

#           V1 V2 V3 V4 V5 V6
#2016-09-30  3  8  6  4 11  3
#2016-10-01  4  3  9  2  5  7
#2016-10-02  5  7  9  7  7 10
#2016-10-03  7  2  5  3  5  3
#2016-10-04  2  6  7  2  4  5
#2016-10-05  2  2 11  2  4  7
#2016-10-06  4  3 10 10  8  2
#2016-10-07  3  6  4  5  9  4
#2016-10-08  5  4 10 10  3  8
#2016-10-09  3  3 11  8 11  7

They give the same output, and you get dates as row names! What is more, the original object class is respected!

class(y1)
# [1] "zoo"

class(y2)
# [1] "xts" "zoo"