Chris Chris - 3 months ago 10
R Question

R - add column checking occurrence of something in last n rows of column

I want to create a new column where at each row TRUE is returned if a certain value is found within the last n rows of another column, and FALSE is returned otherwise.

Here is an example dataframe (suppose this is a sample from a much larger dataframe):

A
2
23
1
5
6
15
14
3
7
9
55
3
77
2


And here is what I want (where conditional value=1 and n=10)

A B
2 FALSE
23 FALSE
1 FALSE
5 TRUE
6 TRUE
15 TRUE
14 TRUE
3 TRUE
7 TRUE
9 TRUE
55 TRUE
3 TRUE
77 TRUE
2 FALSE


I can do this with many "OR" conditions in an ifelse statement in dplyr:

df<-df %>% mutate(B=ifelse(lag(A)==1|lag(A,2)==1 ... |lag(A,10)==1,T,F))


But this is far too tedious, especially when n is large. Also, lag in dplyr only takes integers so
lag(A,1:10)
doesn't work.

Is there an easy way to do this (preferably without a for loop)?

Answer

As you've noticed, lag from dplyr does not allow you to pass a vector as shift amount, but the shift function from data.table allows you to do so, which has the same functionality as lag and lead in dplyr, so you can use shift from data.table with a Reduce function to do that:

library(data.table)
setDT(df)
df[, B := Reduce("|", shift(A == 1, n = 1:10, fill = F))]
df
     A     B
# 1:  2 FALSE
# 2: 23 FALSE
# 3:  1 FALSE
# 4:  5  TRUE
# 5:  6  TRUE
# 6: 15  TRUE
# 7: 14  TRUE
# 8:  3  TRUE
# 9:  7  TRUE
#10:  9  TRUE
#11: 55  TRUE
#12:  3  TRUE
#13: 77  TRUE
#14:  2 FALSE
Comments