user1165199 user1165199 - 7 days ago 6
R Question

R: Issue when trying to use 'match' inside mutate for a grouped tbl

I have a dataframe with an id column and an boolean event column:

x <- data.frame(id = c(0,0,0,1,1,1,2,2,2,2,3,3,3),
event = c(F,F,F,T,F,F,F,T,F,F,F,T,T))


For each
id
I want to create a column next to it with the position where the event is first
TRUE
. So for
id 0
there are no
TRUE
s so I get
NA
, for
id 1
the first element is
TRUE
so I get
1
, for
id 2
I get
2
, and for
id 3
I also get
2
.

Expected output:

id event event_num
(dbl) (lgl) (int)
1 0 FALSE NA
2 0 FALSE NA
3 0 FALSE NA
4 1 TRUE 1
5 1 FALSE 1
6 1 FALSE 1
7 2 FALSE 2
8 2 TRUE 2
9 2 FALSE 2
10 2 FALSE 2
11 3 FALSE 2
12 3 TRUE 2
13 3 TRUE 2


To try and get this I use the code:

x %>% group_by(id) %>% mutate(event_num = match(TRUE, event))


However this gives me

id event event_num
(dbl) (lgl) (int)
1 0 FALSE NA
2 0 FALSE NA
3 0 FALSE NA
4 1 TRUE 1
5 1 FALSE 1
6 1 FALSE 1
7 2 FALSE NA
8 2 TRUE NA
9 2 FALSE NA
10 2 FALSE NA
11 3 FALSE 2
12 3 TRUE 2
13 3 TRUE 2


i.e.
id
2 has
NA
instead of
2
.

EDIT
Updated dplyr to 0.5.0 and it works, was using 0.4.3 before

Answer

We can use which and choose the first appearence:

library(dplyr)
x %>% group_by(id) %>% mutate(event_num = which(event)[1])
# Source: local data frame [13 x 3]
# Groups: id [4]
# 
#       id event event_num
#    <dbl> <lgl>     <int>
# 1      0 FALSE        NA
# 2      0 FALSE        NA
# 3      0 FALSE        NA
# 4      1  TRUE         1
# 5      1 FALSE         1
# 6      1 FALSE         1
# 7      2 FALSE         2
# 8      2  TRUE         2
# 9      2 FALSE         2
# 10     2 FALSE         2
# 11     3 FALSE         2
# 12     3  TRUE         2
# 13     3  TRUE         2

We can also use match(TRUE, event). But I usually avoid matching to boolean elements to vectors.

Comments