J Giancarlo J Giancarlo - 1 year ago 101
R Question

Using data.table function in lapply on a list with data.frames elements

First question, let me know if more info or background is needed in the comments please.

Many answers on here and elsewhere deal with calling lapply in a data.table function. I want to do the opposite, which on paper should be easy

lapply(list.of.dfs, fun(x) x)
but I cant get it to work with data.table functions.

I have a list that contains several data.frames with the same columns but differing numbers of rows. This comes from the output of several simulation scenarios so they must be treated seperately and not rbind'ed.

#sample list of data.frames
scenarios <- replicate(5, data.frame(a=sample(letters[1:4],10,T),
b=sample(1:2,10,T),
x=sample(1:10, 10),
y =runif(10)), simplify = FALSE)


I want to add a column to every element that is the sum of x/y by a and b.

From the data.table documentation in the examples section the process to do this for one data.frame is the following (search: add new column by reference by group in the doc page):

test <- as.data.table(scenarios[[1]]) #must specify data.table class
test[, newcol := sum(x/y), by = .(a , b)][]


I want to use lapply to do the same thing to every element in the scenarios list and return the list.
My most recent attempt:

lapply(scenarios, function(i) {as.data.table(i[, z := sum(x/y), by=.(a,b)]); i})


but I keep getting the error
unused argument (by = .a,b))


After pouring over the results of this and other sites I have been unable to solve this problem. Which I'm fairly sure means that there is something I dont understand about calling anonymous functions, and/or using the data.table function. Is this one of those cases where one you use the [ as the function? Or possibly my as.data.table is out of place.

This answer was a step in the right direction (I think), it covers the use of fun(x) {... ; x} to use an anonymous function and return x.

Thanks!

lmo lmo
Answer Source

You can use setDT here instead.

scenarios <- lapply(scenarios, function(i) {setDT(i); i[, z := sum(x/y), by=.(a,b)]})

scenarios[[1]]
   a b  x          y         z
 1: c 2  2 0.87002174  2.298793
 2: b 2 10 0.19720775 78.611837
 3: b 2  8 0.47041670 78.611837
 4: b 2  4 0.36705023 78.611837
 5: a 1  5 0.78922686 12.774035
 6: a 1  6 0.93186209 12.774035
 7: b 1  3 0.83118438  3.609307
 8: c 1  1 0.08248658 30.047494
 9: c 1  7 0.89382050 30.047494
10: c 1  9 0.89172831 30.047494

Using as.data.table, the syntax would be

scenarios <- lapply(scenarios, function(i) {i <- as.data.table(i); i[, z := sum(x/y),
                                                                     by=.(a,b)]})

but this wouldn't be recommended as it will create an additional copy, which is avoided by setDT.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download