Alexander - 6 months ago 30
R Question

# Summing rows after first occurance of a certain number

I would like to to get the sum of the rows after first occurrence of a certain number. In this case it is

`'10'`
for instance.

I though If we can know the row number after first occurrence and the ending row number of that group and we can sum in between them.

I can get the first occurrence of '10' each group but I don't know how can get the sum of the rows.

``````df <- data.frame(gr=rep(c(1,2),c(7,9)),
y_value=c(c(0,0,10,8,8,6,0),c(0,0,10,10,5,4,2,0,0)))

> df
gr y_value
1   1       0
2   1       0
3   1      10
4   1       8
5   1       8
6   1       6
7   1       0
8   2       0
9   2       0
10  2      10
11  2      10
12  2       5
13  2       4
14  2       2
15  2       0
16  2       0
``````

My initial attempt is below which is not working for some reason even for grouping part:(!

``````library(dplyr)
df%>%
group_by(gr)%>%
mutate(check1=any(y_value==10),row_sum=which(y_value == 10)[1])
``````

Expected output

``````> df
gr y_value sum_rows_range
1   1       0      22/4
2   1       0      22/4
3   1      10      22/4
4   1       8      22/4
5   1       8      22/4
6   1       6      22/4
7   1       0      22/4
8   2       0      21/6
9   2       0      21/6
10  2      10      21/6
11  2      10      21/6
12  2       5      21/6
13  2       4      21/6
14  2       2      21/6
15  2       0      21/6
16  2       0      21/6
``````

A `dplyr` solution:

``````library(dplyr)
df %>%
group_by(gr) %>%
slice(if(any(y_value == 10)) (which.max(y_value == 10)+1):n() else row_number()) %>%
summarize(sum = sum(y_value),
rows = n()) %>%
inner_join(df)
``````

Notes:

The main idea is to `slice` on the rows after the first 10 occurs. `any(y_value == 10))` and `else row_number()` are just to take care of the case where there are no 10's in `y_value`.

Reading the documentation for `?which.max`, you will notice that when it is applied to a logical vector, in this case `y_value == 10`, "with both `FALSE` and `TRUE` values, `which.min(x)` and `which.max(x)` return the index of the first `FALSE` or `TRUE`, respectively, as `FALSE < TRUE`."

In other words, `which.max(y_value == 10)` will give the index of the first occurrence of 10. By adding 1 to it, I can start `slice`ing from the value right after the first occurrence of 10.

Result:

``````# A tibble: 16 × 4
gr   sum  rows y_value
<dbl> <dbl> <int>   <dbl>
1      1    22     4       0
2      1    22     4       0
3      1    22     4      10
4      1    22     4       8
5      1    22     4       8
6      1    22     4       6
7      1    22     4       0
8      2    21     6       0
9      2    21     6       0
10     2    21     6      10
11     2    21     6      10
12     2    21     6       5
13     2    21     6       4
14     2    21     6       2
15     2    21     6       0
16     2    21     6       0
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download