soitgoes soitgoes - 1 month ago 5
R Question

Assign value in all rows following satisfaction of first instance in R

I've got a dataframe with a set of numeric observations (Value) per grouping variable (ID). I'm looking for an elegant way to do the following in a new column per grouping variable (ID): if Value is -40 or lower, assign a value of 1 starting from the row that follows the first instance of reaching -40 or lower. Every Value preceding the -40 or lower (and including that first instance of -40 or lower) should be assigned something other than a 1 (i.e., assign 0).

Example data:

+----+-------+-------+
| ID | Order | Value |
+----+-------+-------+
| 1 | 1 | -40 |
| 1 | 2 | 32 |
| 1 | 3 | -59 |
| 1 | 4 | -35 |
| 2 | 1 | 47 |
| 2 | 2 | 14 |
| 2 | 3 | 0 |
| 3 | 1 | 10 |
| 3 | 2 | 63 |
| 3 | 3 | -32 |
| 3 | 4 | -46 |
| 3 | 5 | -27 |
| 3 | 6 | -42 |
| 3 | 7 | 45 |
+----+-------+-------+


I am looking for something to accomplish this (below):

+----+-------+-------+-------------+
| ID | Order | Value | After_Neg40 |
+----+-------+-------+-------------+
| 1 | 1 | 32 | 0 |
| 1 | 2 | -40 | 0 |
| 1 | 3 | -59 | 1 |
| 1 | 4 | -35 | 1 |
| 2 | 1 | 47 | 0 |
| 2 | 2 | 14 | 0 |
| 2 | 3 | 0 | 0 |
| 3 | 1 | 10 | 0 |
| 3 | 2 | 63 | 0 |
| 3 | 3 | -32 | 0 |
| 3 | 4 | -46 | 0 |
| 3 | 5 | -27 | 1 |
| 3 | 6 | -42 | 1 |
| 3 | 7 | 45 | 1 |
+----+-------+-------+-------------+


I tried searching for this type of problem on SO without much luck, but I also had a hard time knowing how to describe this type of problem (maybe it has already been answered, but my search terms may not have uncovered it). If you have any elegant ways to solve this, I would appreciate your help. Thanks!

Answer

Using data.table, assuming that the data is in a data frame df:

library(data.table)
setDT(df)[, After_Neg40:=ifelse(!is.na(the.row <- which(Value <= -40)[1]) & (1:.N) > the.row,1,0), by=ID][]
##    ID Order Value After_Neg40
## 1:  1     2    32           0
## 2:  1     1   -40           0
## 3:  1     3   -59           1
## 4:  1     4   -35           1
## 5:  2     1    47           0
## 6:  2     2    14           0
## 7:  2     3     0           0
## 8:  3     1    10           0
## 9:  3     2    63           0
##10:  3     3   -32           0
##11:  3     4   -46           0
##12:  3     5   -27           1
##13:  3     6   -42           1
##14:  3     7    45           1

The logic is:

  1. Find the first row for which Value is less than or equal to 40 using which(Value <= -40)[1]. Set this to the.row.
  2. If there are no rows for which this condition is true for a group by ID, then the.row will return NA, so we check that with is.na.
  3. So, if there is such a the.row (i.e., is.na returns FALSE) and for those rows that are greater than the.row, set the value to 1, else 0. Do this with an ifelse.

The result matches your posted desired output, but uses the following data, which switches the first two rows of your posted input data:

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L), Order = c(2L, 1L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L), Value = c(32L, -40L, -59L, -35L, 47L, 14L, 
0L, 10L, 63L, -32L, -46L, -27L, -42L, 45L)), .Names = c("ID", 
"Order", "Value"), class = "data.frame", row.names = c(NA, -14L
))
##   ID Order Value
##1   1     2    32
##2   1     1   -40
##3   1     3   -59
##4   1     4   -35
##5   2     1    47
##6   2     2    14
##7   2     3     0
##8   3     1    10
##9   3     2    63
##10  3     3   -32
##11  3     4   -46
##12  3     5   -27
##13  3     6   -42
##14  3     7    45