usct01 -4 years ago 144
R Question

# split rows into 10 groups each having same total of a value

I have a data with 2 columns, ID and Revenue. I want to create column which will divide the data in 10 groups each having 10% of the total revenue. The quantile method gives me 10 groups with equal number of ID not revenue.

idrev[ , decile := cut(Revenue,
breaks = quantile(Revenue, probs = seq(0, 1, by = 1/10)),
labels = 1:10, right = FALSE)]

I get following type result

N Revenue %Revenue
100 \$3,992 80%
100 \$518 10%
100 \$236 5%
100 \$126 3%
100 \$68 1%
100 \$35 1%
100 \$16 0%
100 \$6 0%
100 \$2 0%
100 \$1 0%
1,000 \$5,000 100%

while I am looking for this result

N Revenue %Revenue
798 500 10%
104 500 10%
47 500 10%
25 500 10%
14 500 10%
7 500 10%
3 500 10%
2 500 10%
1 500 10%
1 500 10%
1,000 \$5,000 100%

Please suggest a solution for this in R.

Adding a code to get sample data and stats

library(Hmisc);library(data.table)
set.seed(123)
idrev<-data.table(ID=1:1000, Revenue=sample(100,1000,replace=T))
idrev[,.(.N,sum(Revenue))] #Check total revenue
idrev[ , decile := cut2(Revenue,g=10)]
idrev[,.(.N,sum(Revenue)),by=decile][order(decile)]

Here is a data.table only method that should get you there:

idrev[order(Revenue), revDec := 10 * ceiling(10 * (cumsum(Revenue) / sum(Revenue)))]

This is a straight forward calculation of the deciles after ordering the rows by revenue.

Here is the result of summing the revenue by revDec:

idrev[, .(Revenue=sum(Revenue)), by="revDec"]
revDec Revenue
1:     10    5004
2:     70    5070
3:     20    5039
4:     80    5025
5:     90    4974
6:     30    4974
7:     40    5059
8:     50    5026
9:    100    5091
10:     60    4960

They're all pretty close to 5000.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download