BigFinger BigFinger - 3 months ago 10
R Question

Computing on multiple column names in a data.table

I have a sparse data table that looks like this:

data = data.table(
var1 = c("a","",""),
var2 = c("","","c"),
var3 = c("a","b",""),
var4 = c("","b","")
)
var1 var2 var3 var4
1: a a
2: b b
3: c


I would like to add a column that contains a string of zeros and ones indicating which variables are present in any row, like this:

var1 var2 var3 var4 concat
1: a a 1|0|1|0
2: b b 0|0|1|1
3: c 0|1|0|0


I can get to this with the following command:

data[, concat := paste(
as.integer(var1 != ""),
as.integer(var2 != ""),
as.integer(var3 != ""),
as.integer(var4 != ""),
sep = "|")]


However, if I have hundreds of variables, I would rather use some sort of computation to get to the desired expression. Perhaps something based on
paste0("var",1:4)
, or at least a vector of column names. Any suggestions?

Answer

Same basic approach as above:

data[ , concat := apply(.SD, 1, function(x) paste(+(x == ""), collapse = "|"))][]
#    var1 var2 var3 var4  concat
# 1:    a         a      0|1|0|1
# 2:              b    b 1|1|0|0
# 3:         c           1|0|1|1