Max M - 1 year ago 74

R Question

I would like to calculate sums by cut values for several columns.

I know how to do this manually for each column, but I am struggling finding a decent way to automate the function for all columns. Usually I would use my function with lapply, but I chose to do it with data.table and I could not figure out how to use char values.

I was hoping for a list of data.tables with the sums for each category or a matrix/data.table with a first column for each column variable and the following columns as the categories, like

`data.table(col.name=c("v1","v2"), low=c( 1185.3074,1175.7261 ), high=c( 1175.726,350.3937 ))`

MWE

`rm(list=ls())`

if(!require(data.table)) { install.packages("data.table"); require(data.table)}

set.seed(123)

DT<-data.table(v1=runif(50,10,50),v2=runif(50,10,50))

DT[,sum(v1, na.rm = T), by=cut(DT[,v1], breaks=c(0,25,50), labels = c("low", "high"))]

DT[,sum(v2, na.rm = T), by=cut(DT[,v2], breaks=c(0,25,50), labels = c("low", "high"))]

Answer Source

I guess one standard way would be to reshape twice:

```
dcast(
melt(DT),
variable ~ cut(value, c(0,25,50), c("low","high")),
fun = sum
)
# variable low high
# 1: v1 323.2453 1216.937
# 2: v2 331.0626 1122.991
```

`melt`

reshapes to "long"; while `dcast`

reverts to "wide."