Max M - 1 month ago 5
R Question

# Sum by on the fly factor over many columns

I would like to calculate sums by cut values for several columns.
I know how to do this manually for each column, but I am struggling finding a decent way to automate the function for all columns. Usually I would use my function with lapply, but I chose to do it with data.table and I could not figure out how to use char values.

I was hoping for a list of data.tables with the sums for each category or a matrix/data.table with a first column for each column variable and the following columns as the categories, like

``````data.table(col.name=c("v1","v2"), low=c( 1185.3074,1175.7261 ), high=c( 1175.726,350.3937 ))
``````

MWE

``````rm(list=ls())
if(!require(data.table)) { install.packages("data.table"); require(data.table)}
set.seed(123)
DT<-data.table(v1=runif(50,10,50),v2=runif(50,10,50))

DT[,sum(v1, na.rm = T), by=cut(DT[,v1], breaks=c(0,25,50), labels = c("low", "high"))]
DT[,sum(v2, na.rm = T), by=cut(DT[,v2], breaks=c(0,25,50), labels = c("low", "high"))]
``````

I guess one standard way would be to reshape twice:

``````dcast(
melt(DT),
variable ~ cut(value, c(0,25,50), c("low","high")),
fun = sum
)

#    variable      low     high
# 1:       v1 323.2453 1216.937
# 2:       v2 331.0626 1122.991
``````

`melt` reshapes to "long"; while `dcast` reverts to "wide."

Source (Stackoverflow)