Shape Shape - 3 months ago 14
R Question

data.table expand sub list use names by reference

I have a table that I've created from JSON data, which has some nested list columns that I'd like to make into their own columns (it also has embedded null values, which is why I'm using

do.call(rbind, list)
rather than
rbindlist
) :

Example Data:

# Make some sample JSON
rawjson <- lapply(1:10,
function(x) list(stats = list(stat1 = sample(LETTERS,1),
stat2 = sample(LETTERS,1),
stat3 = NULL),
othervar = runif(1)))
#convert to data.table
dtjson <- data.table(do.call(rbind, rawjson))


When we check our output, we see we have a list column called
stats


> dtjson
stats othervar
1: <list> 0.6980694
2: <list> 0.1696928
3: <list> 0.6168877
4: <list> 0.4322135
5: <list> 0.6941624
6: <list> 0.3354516
7: <list> 0.7159235
8: <list> 0.2019412
9: <list> 0.8908848
10: <list> 0.4643908


Now, I can turn that stats column inside out with
purrr::transpose


library(purrr)
> dtjson[,purrr::transpose(stats)]
stat1 stat2 stat3
1: U G NULL
2: J X NULL
3: D E NULL
4: F V NULL
5: V W NULL
6: Z I NULL
7: R O NULL
8: A H NULL
9: L R NULL
10: A M NULL


But, I'm at a loss as to how to assign each of these new columns by reference.

I tried:

> dtjson[,names(purrr::transpose(stats)) := purrr::transpose(stats)]
Error in transpose(stats) : object 'stats' not found


On the other hand, This works:

dtjson[, paste0('V',1:3) := purrr::transpose(stats)]


but it requires me to know beforehand exactly how many columns are going to result from
transpose(stats)
, which I may not know until I transpose stats. And preferably, I'd like to keep the internal names as defined inside the list column, whatever they are.

Is there any way to use the names that the list already has to assign by reference?

EDIT: transpose from purrr was doing the job, not data.table::transpose

Answer

You can do

dtjson[, names(s <- purrr::transpose(dtjson$stats)) := s]
rm(s)

I borrowed this from @MichaelChirico's post on the data.table issue tracker.


An alternative, not relying on s being an unused variable name, is

dtjson[, names(dtjson$stats[[1]]) := purrr::transpose(stats)]

Hopefully there will be a better way to go about this eventually.