user85727 - 1 year ago 81
R Question

# Selecting factors based upon distance between two rows in a column

I am trying to select the ids from the data frame below that have values of 2 in the num column that are at least 2 rows apart. In other words, given the following data frame:

``````df2 <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3),
num=c(1,2,1,1,2,1,1,1,2,2,1,1,1,2,2))
df2\$id <- as.factor(df2\$id)
``````

How can I select the id variables such that values of 2 are separated by at least one row?

The result of the procedure I am seeking would select id 1, since the values of 2 are separated by at least one row (2 rows in this example). Thanks.

Also, the solution below doesn't seem to work on the following example:

``````df <- data.frame(
id=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),
num=c(1,2,1, 1,2,2, 1,1,1,2,2,1, 1,1,1,2,2,1, 1,2,1,2,2,2)
)
df\$id<-as.factor(df\$id)
``````

Again, the result should be only id 1. I just want to find, per ID, any instances where the number 2 is separated by another 2 by one or more rows.

data.table I would use the data.table package:

``````library(data.table)
setDT(df)

df[num == 2, .N > 1L && any(diff(.I) > 2L), by=id]

id    V1
1:  1  TRUE
2:  2 FALSE
3:  3 FALSE
``````

How it works: The syntax is `DT[i, j, by]`

• `i` subsets rows
• then `by` groups the remaining rows
• then `j` is computed

Inside `j` we have `.I`, which stores row number for convenience; and `.N`, the number of rows in the `by` group. So to extract the ids have `V1 == TRUE`, we can do `res[V1 == TRUE, id]`. Alternately, the whole operation can be chained like

``````df[num == 2, .N > 1L && any(diff(.I) > 2L), by=id][V1 == TRUE, id]

[1] 1
Levels: 1 2 3
``````

base or dplyr The (almost) analogue in base R is

``````sapply(split(df\$num == 2, df\$id), function(x){
w = which(x)
length(w) > 1L && any(diff(w) > 2L)
})

1     2     3
TRUE FALSE FALSE
``````

The latter also works in dplyr

``````library(dplyr)

df %>% group_by(id) %>% summarise(x = {
w = which(num == 2)
length(w) > 1L && any(diff(w) > 2L)
})

# A tibble: 3 x 2
id     x
<fctr> <lgl>
1      1  TRUE
2      2 FALSE
3      3 FALSE
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download