I have som data in a data.table that I want to to sum. Trivial example supplied below. I start with (variable) n-columns (char) and some group variables.
I can convert them (nsk-vars in my example) to numeric/integer beforehand, if that's the way to go. I would like to be able to use a string of columnames to create a new column which is the sum of these(nsk1-nsk3) and handle NA:s. How do I do this in a resonable way?
x <- data.table(a = c("GrpA", "GrpB"),
nsk1 = c("1","3"),
nsk2 = c(NA,"1"),
nsk3 = c("3", "4"))
ClNamesStr <- colnames(x)
ClNamesStr <- ClNamesStr[grepl("^nsk", ClNamesStr)]
nskStrSum <- paste(ClNamesStr, collapse = "),as.numeric(")
nskStrSum <- paste("as.numeric(", nskStrSum, "), na.rm = TRUE")
x[, nsk0 := sum(eval(nskStrSum)), by = a]
a nsk1 nsk2 nsk3 nsk0
GrpA 1 NA 3 4
GrpB 3 1 4 7
Here is one way to go.
x[, nsk0 := rowSums(sapply(.SD, function(i) as.numeric(i)), na.rm=TRUE), .SDcols=grep("^nsk", names(x)), by = a]
x a nsk1 nsk2 nsk3 othr nsk0 1: GrpA 1 NA 3 a 4 2: GrpA 3 1 4 b 8
sapply runs through the selected variables, converts each to a numeric and returns a matrix. This matrix is passed to
rowSums which sums up the values in each row and ignores the
.SD is a shorthand to select of the data.table, (except variables contained in "by"). However, the variables are selected among them using
.SDcols along with
Some data.tablers might not be happy with this solution as it converts some of the data into a matrix. However, this conversion may be necessary in your case.
I added an additional "nuisance" variable to show that it will work.
x <- data.table(a = c("GrpA", "GrpA"), nsk1 = c("1","3"), nsk2 = c(NA,"1"), nsk3 = c("3", "4"), othr = letters[1:2])