MFR MFR - 1 month ago 6
R Question

Ho do I remove the data after an event?

This is my data, I wish to remove all the data for an ID after an event

ID Event time
1 0 1
1 1 2
2 0 3
1 0 4
2 0 5


Since for ID number 1 , the event was greater than 0, I'd like to remove all the next data of ID 1. So, I remove row number 4 and my ideal output will be

ID Event time
1 0 1
1 1 2
2 0 3
2 0 5


How can I do that?

dput(df)
structure(list(ID = c(1L, 1L, 2L, 1L, 2L), Event = c(0L, 1L,
0L, 0L, 0L), time = 1:5), .Names = c("ID", "Event", "time"), class = "data.frame", row.names = c(NA,
-5L))

Answer

With dplyr, you can filter for time values less than the minimum one where Event is 1, grouped by ID:

library(dplyr)

df %>% group_by(ID) %>% filter(time <= min(time[Event == 1]))

## Source: local data frame [4 x 3]
## Groups: ID [2]
## 
##      ID Event  time
##   <int> <int> <int>
## 1     1     0     1
## 2     1     1     2
## 3     2     0     3
## 4     2     0     5

Instead of using time, you could use seq or row_number with which. In base R, you could use ave to handle the grouping, but it can only handle one input vector, so a seq approach is simpler than working with time:

df[as.logical(ave(df$Event, df$ID, FUN = function(x) {
    seq_along(x) <= min(which(x == 1))
})), ]

##   ID Event time
## 1  1     0    1
## 2  1     1    2
## 3  2     0    3
## 5  2     0    5

Both approaches depend on the fact that min(numeric(0)) returns Inf when there is no 1 values for an ID, but add an if condition to explicitly account for the situation, if you like.