monodeferia monodeferia - 1 month ago 5
R Question

subsetting data.cube inside custom function

I am trying to make a function of my own to subset a data.cube in R, and format the result automatically for some predefined plots I aim to build.

This is my function.

require(data.table)
require(data.cube)

secciona <- function(cubo = NULL,
fecha_valor = list(),
loc_valor = list(),
prod_valor = list(),
drop = FALSE){

cubo[fecha_valor, loc_valor, prod_valor, drop = drop]

## The line above will really be an asignment of type y <- format(cubo[...drop])
## Rest of code which will end up plotting the subset of the function
}


The thing is I keep on getting the error:
Error in eval(expr, envir, enclos) : object 'fecha_valor' not found


What is most strange for me, is that on the console everything works fine, but not when defined inside the subsetting function of mine.

In console:

> dc[list(as.Date("2013/01/01"))]
> dc[list(as.Date("2013/01/01")),]
> dc[list(as.Date("2013/01/01")),,]
> dc[list(as.Date("2013/01/01")),list(),list()]


all give as result:

<data.cube>
fact:
5627 rows x 2 dimensions x 1 measures (0.32 MB)
dimensions:
localizacion : 4 entities x 3 levels (0.01 MB)
producto : 153994 entities x 3 levels (21.29 MB)
total size: 21.61 MB


But whenever I try

secciona(dc)
secciona(dc, fecha_valor = list(as.Date("2013/01/01")))
secciona(dc, fecha_valor = list())


I always get the error above mentioned.

Any ideas why this is happening? should I proceed in else way for my approach of editing the subset for plotting?

Answer

This is the standard issue that R users will face when dealing with non-standard evaluation. This is a consequence of Computing on the language R language feature.
[.data.cube function expects to be used in interactive way, that extends the flexibility of the arguments passed to it, but gives some restrictions. In that aspect it is similar to [.data.table when passing expressions from wrapper function to [ subset operator. I've added dummy example to make it reproducible.

I see you are already using data.cube-oop branch, so just to clarify for other readers. data.cube-oop branch is 92 commits ahead of master branch, to install use the following.

install.packages("data.cube", repos = paste0("https://", c(
    "jangorecki.gitlab.io/data.cube",
    "Rdatatable.github.io/data.table",
    "cran.rstudio.com"
)))

library(data.cube)
set.seed(1)
ar = array(rnorm(8,10,5), rep(2,3), 
           dimnames = list(color = c("green","red"), 
                           year = c("2014","2015"), 
                           country = c("IN","UK"))) # sorted
dc = as.data.cube(ar)

f = function(color=list(), year=list(), country=list(), drop=FALSE){
    expr = substitute(
        dc[color=.color, year=.year, country=.country, drop=.drop],
        list(.color=color, .year=year, .country=country, .drop=drop)
    )
    eval(expr)
}
f(year=list(c("2014","2015")), country="UK")
#<data.cube>
#fact:
#  4 rows x 3 dimensions x 1 measures (0.00 MB)
#dimensions:
#  color : 2 entities x 1 levels (0.00 MB)
#  year : 2 entities x 1 levels (0.00 MB)
#  country : 1 entities x 1 levels (0.00 MB)
#total size: 0.01 MB

You can track the expression just by putting print(expr) before/instead eval(expr).

Read more about non-standard evaluation:
- R Language Definition: Computing on the language
- Advanced R: Non-standard evaluation
- manual of substitute function
And some related SO questions:
- Passing on non-standard evaluation arguments to the subset function
- In R, why is [ better than subset?

Comments