Levi Brackman - 9 months ago 40

R Question

I am trying to find the top cumulative amount of rows of a column up to a specified number. So if I have this data set

`df <- data.frame(x = rnorm(26, 10, 2))`

and I want the number of rows from the top that sums to 100. I have tried using variations of this:

`df %>% top_n(6)`

But that does not work. Is there a function that can do this easily that I am missing?

The use case for this is if I have a frequency column that gives me the percentage that an observation appears in a dataset after conducting group_by and desc functions and then I want the top, say 15%, of all observations.

Thanks in advance.

Answer Source

The trick is to use the cumsum function, then narrow the frame to the values of cumsum < 100 and compute nrow on that result:

This will do it:

```
df <- data.frame(x = rnorm(26, 10, 2))
df$accum <- cumsum(df$x)
df
nrow(df[df$accum<100,])
```

Result:

```
x accum
1 10.045870 10.04587
2 6.626029 16.67190
3 9.585552 26.25745
4 3.311654 29.56911
5 12.247780 41.81689
6 7.973587 49.79047
7 12.194685 61.98516
8 12.415279 74.40044
9 8.617609 83.01805
10 10.339062 93.35711
11 11.976633 105.33374
12 7.301745 112.63549
13 9.247254 121.88274
14 5.697414 127.58015
15 9.721602 137.30176
16 12.371765 149.67352
17 10.231773 159.90529
18 10.474578 170.37987
19 13.562808 183.94268
20 7.419556 191.36224
21 7.613043 198.97528
22 9.416050 208.39133
23 10.047952 218.43928
24 9.788021 228.22730
25 10.709967 238.93727
26 8.766313 247.70358
10
```