BigFinger - 1 year ago 73
R Question

How do I do a complex computation on columns and rows of a data table?

I am learning the syntax of manipulating data.table variables. Although I can do simple things, my understanding is not thorough enough for more complex tasks. For example, I would like to transform the following data to have one distinct "type" value per row, separate columns generated based on the value of "subtype", and unique values collapsed when there are multiple rows with the same "type/subtype" combination.

Given the input data:

``````data = data.frame(
var1 = c("a","b","c","b","d","e","f"),
var2 = c("aa","bb","cc","dd","ee","ee","ff"),
subtype = c("1","2","2","2","1","1","2"),
type = c("A","A","A","A","B","B","B")
)

var1 var2 subtype type
1    a   aa       1    A
2    b   bb       2    A
3    c   cc       2    A
4    b   dd       2    A
5    d   ee       1    B
6    e   ee       1    B
7    f   ff       2    B
``````

I would like to derive:

``````  1.var1 1.var2 2.var1 2.var2     2.type
A "a"    "aa"   "b|c"  "bb|cc|dd" "A"
B "d|e"  "ee"   "f"    "ff"       "B"
``````

Using a data frame, I can achieve this with the following code:

``````data.derived = do.call(
rbind,
lapply(
split(data,list(data\$type)),
function(x) {
do.call (
c,
lapply(
split(x, list(x\$subtype)),
function(y) {
result = c(
var1 = paste(unique(y\$var1),collapse ="|"),
var2 = paste(unique(y\$var2),collapse ="|")
)
if (as.character(y\$subtype[1]) == "2") {
result = c(result, type = as.character(y\$type[1]))
}
result}))}))
``````

How can I do the same using a data table?

From your result, it's clear to see that you are transforming data from long to wide format and subtype is spread along the row direction, so you will need `dcast` from `data.table`. And since your want to aggregate your values from `var1` and `var2` to be a single string, you will need to customize the aggregate function as `paste` to collapse the result:
``````library(data.table)