Alexander - 25 days ago 9

R Question

I would like to to get the sum of the rows after first occurrence of a certain number. In this case it is

`'10'`

I though If we can know the row number after first occurrence and the ending row number of that group and we can sum in between them.

I can get the first occurrence of '10' each group but I don't know how can get the sum of the rows.

`df <- data.frame(gr=rep(c(1,2),c(7,9)),`

y_value=c(c(0,0,10,8,8,6,0),c(0,0,10,10,5,4,2,0,0)))

> df

gr y_value

1 1 0

2 1 0

3 1 10

4 1 8

5 1 8

6 1 6

7 1 0

8 2 0

9 2 0

10 2 10

11 2 10

12 2 5

13 2 4

14 2 2

15 2 0

16 2 0

My initial attempt is below which is not working for some reason even for grouping part:(!

`library(dplyr)`

df%>%

group_by(gr)%>%

mutate(check1=any(y_value==10),row_sum=which(y_value == 10)[1])

Expected output

`> df`

gr y_value sum_rows_range

1 1 0 22/4

2 1 0 22/4

3 1 10 22/4

4 1 8 22/4

5 1 8 22/4

6 1 6 22/4

7 1 0 22/4

8 2 0 21/6

9 2 0 21/6

10 2 10 21/6

11 2 10 21/6

12 2 5 21/6

13 2 4 21/6

14 2 2 21/6

15 2 0 21/6

16 2 0 21/6

Answer Source

A `dplyr`

solution:

```
library(dplyr)
df %>%
group_by(gr) %>%
slice(if(any(y_value == 10)) (which.max(y_value == 10)+1):n() else row_number()) %>%
summarize(sum = sum(y_value),
rows = n()) %>%
inner_join(df)
```

**Notes:**

The main idea is to `slice`

on the rows after the first 10 occurs. `any(y_value == 10))`

and `else row_number()`

are just to take care of the case where there are no 10's in `y_value`

.

Reading the documentation for `?which.max`

, you will notice that when it is applied to a logical vector, in this case `y_value == 10`

, "with both `FALSE`

and `TRUE`

values, `which.min(x)`

and `which.max(x)`

return the index of the first `FALSE`

or `TRUE`

, respectively, as `FALSE < TRUE`

."

In other words, `which.max(y_value == 10)`

will give the index of the first occurrence of 10. By adding 1 to it, I can start `slice`

ing from the value right after the first occurrence of 10.

**Result:**

```
# A tibble: 16 × 4
gr sum rows y_value
<dbl> <dbl> <int> <dbl>
1 1 22 4 0
2 1 22 4 0
3 1 22 4 10
4 1 22 4 8
5 1 22 4 8
6 1 22 4 6
7 1 22 4 0
8 2 21 6 0
9 2 21 6 0
10 2 21 6 10
11 2 21 6 10
12 2 21 6 5
13 2 21 6 4
14 2 21 6 2
15 2 21 6 0
16 2 21 6 0
```