Max M Max M - 2 months ago 15
R Question

Sum by on the fly factor over many columns

I would like to calculate sums by cut values for several columns.
I know how to do this manually for each column, but I am struggling finding a decent way to automate the function for all columns. Usually I would use my function with lapply, but I chose to do it with data.table and I could not figure out how to use char values.

I was hoping for a list of data.tables with the sums for each category or a matrix/data.table with a first column for each column variable and the following columns as the categories, like

data.table("v1","v2"), low=c( 1185.3074,1175.7261 ), high=c( 1175.726,350.3937 ))


if(!require(data.table)) { install.packages("data.table"); require(data.table)}

DT[,sum(v1, na.rm = T), by=cut(DT[,v1], breaks=c(0,25,50), labels = c("low", "high"))]
DT[,sum(v2, na.rm = T), by=cut(DT[,v2], breaks=c(0,25,50), labels = c("low", "high"))]


I guess one standard way would be to reshape twice:

  variable ~ cut(value, c(0,25,50), c("low","high")), 
  fun = sum

#    variable      low     high
# 1:       v1 323.2453 1216.937
# 2:       v2 331.0626 1122.991

melt reshapes to "long"; while dcast reverts to "wide."