BigFinger BigFinger - 3 months ago 7
R Question

How do I do a complex computation on columns and rows of a data table?

I am learning the syntax of manipulating data.table variables. Although I can do simple things, my understanding is not thorough enough for more complex tasks. For example, I would like to transform the following data to have one distinct "type" value per row, separate columns generated based on the value of "subtype", and unique values collapsed when there are multiple rows with the same "type/subtype" combination.

Given the input data:

data = data.frame(
var1 = c("a","b","c","b","d","e","f"),
var2 = c("aa","bb","cc","dd","ee","ee","ff"),
subtype = c("1","2","2","2","1","1","2"),
type = c("A","A","A","A","B","B","B")
)

var1 var2 subtype type
1 a aa 1 A
2 b bb 2 A
3 c cc 2 A
4 b dd 2 A
5 d ee 1 B
6 e ee 1 B
7 f ff 2 B


I would like to derive:

1.var1 1.var2 2.var1 2.var2 2.type
A "a" "aa" "b|c" "bb|cc|dd" "A"
B "d|e" "ee" "f" "ff" "B"


Using a data frame, I can achieve this with the following code:

data.derived = do.call(
rbind,
lapply(
split(data,list(data$type)),
function(x) {
do.call (
c,
lapply(
split(x, list(x$subtype)),
function(y) {
result = c(
var1 = paste(unique(y$var1),collapse ="|"),
var2 = paste(unique(y$var2),collapse ="|")
)
if (as.character(y$subtype[1]) == "2") {
result = c(result, type = as.character(y$type[1]))
}
result}))}))


How can I do the same using a data table?

Answer

From your result, it's clear to see that you are transforming data from long to wide format and subtype is spread along the row direction, so you will need dcast from data.table. And since your want to aggregate your values from var1 and var2 to be a single string, you will need to customize the aggregate function as paste to collapse the result:

library(data.table)
setDT(data)
dcast(data, type ~ subtype, value.var = c("var1", "var2"), 
            fun = function(v) paste0(unique(v), collapse = "|"))

#    type var1_function_1 var1_function_2 var2_function_1 var2_function_2
# 1:    A               a             b|c              aa        bb|cc|dd
# 2:    B             d|e               f              ee              ff
Comments