panman panman - 5 days ago 5
R Question

R: Function arguments and lapply nested in a function or called from external function with data.table

Still new to

data.table
and working with environments.

I have a
data.table
similar to this (although much larger):

mydt <- data.table(ID = c("a", "a", "a", "b", "b", "b"),
col1 = c(1, 2, 3, 4, 5, 6),
col2 = c(7, 8, 9, 10, 11, 12),
key = "ID")


I wrote a function that takes
mydt
, splits it in a list of
data.table
s by its key, and then in each table in the list of
data.table
s takes the column, specified by the user in an argument and multiplies it by a number, provided by the user in another argument:

myfun <- function(data, constant, column) {
data <- split(x = data, by = key(data))
data <- lapply(data, function(i) {
i[ , (column) := get(column)*constant]
})
return(data)
}

x <- myfun(data = mydt, constant = 3, column = "col1")

x

$a
ID col1 col2
1: a 3 7
2: a 6 8
3: a 9 9

$b
ID col1 col2
1: b 12 10
2: b 15 11
3: b 18 12


If I understand correctly the scoping rules in R,
lapply
will look in the environment it was called in, will find the
column
and
constant
provided as arguments to
myfun
and will use them.

However, the function passed to
lapply
is much longer and more complex than the one here and it will be used in other functions that do many other things than just splitting the
data.table
. This is why I would like to define this part as an external function that will be called within other functions. This is what I did:

split.dt <- function(data) {
split(data, by = key(data))
}

mult <- function(data) {
lapply(data, function(i) {
i[ , (column) := get(column)*constant]
})
}

myfun <- function(data, constant, column) {
data <- split.dt(data = data)
data <- mult(data = data)
}

x <- myfun(data = mydt, constant = 3, column = "col1")


An error is returned:

Error in eval(expr, envir, enclos) : object 'column' not found


What I tried is wrapping
column
like
i[ , eval(column)]
and
i[ , eval(column)]
within the
mult
function with
parent.frame()
and
parent.env()
without any success. At the end I reached a solution where I used
sys.call
to get the arguments passed to
myfun
in a list and use them in
mult
like this:

split.dt <- function(data) {
split(data, by = key(data))
}

mult <- function(data) {
supplied.col <- sys.call(which = -1)[["column"]]
supplied.constant <- sys.call(which = -1)[["constant"]]
lapply(data, function(i) {
i[ , eval(supplied.col) := get(supplied.col)*supplied.constant]
})
}

myfun <- function(data, constant, column) {
data <- split.dt(data = data)
data <- mult(data = data)
}

x <- myfun(data = mydt, constant = 3, column = "col1")

x

$a
ID col1 col2
1: a 3 7
2: a 6 8
3: a 9 9

$b
ID col1 col2
1: b 12 10
2: b 15 11
3: b 18 12


It does work, BUT I am not sure if:


  1. This is the right or most efficient approach. What is the way to make
    mult
    look up at the arguments supplied to
    myfun
    ?

  2. Will this work if the functions are wrapped in a package?


Answer

1) Just pass column and constant to mult as additional arguments.

mult <- function(data, constant, column) {
  lapply(data, function(i) {
    i[ , (column) := get(column)*constant]
  })
}

myfun <- function(data, constant, column) {
  data <- split.dt(data = data)
  data <- mult(data, constant, column)
}

2) Alternately define mult as:

mult <- function(data, envir = parent.frame()) with(envir, 
  lapply(data, function(i) {
    i[ , (column) := get(column)*constant]
  })
)

2a) or

mult <- function(data, envir = parent.frame()) {
  constant <- envir$constant
  column <- envir$column
  lapply(data, function(i) {
    i[ , (column) := get(column)*constant]
  })
}
Comments