user1320502 -4 years ago 27
R Question

# Difference between `%in%` and `==`

``````df <- structure(list(x = 1:10, time = c(0.5, 0.5, 1, 2, 3, 0.5, 0.5,
1, 2, 3)), .Names = c("x", "time"), row.names = c(NA, -10L), class = "data.frame")

df[df\$time %in% c(0.5, 3), ]
##     x time
## 1   1  0.5
## 2   2  0.5
## 5   5  3.0
## 6   6  0.5
## 7   7  0.5
## 10 10  3.0

df[df\$time == c(0.5, 3), ]
##     x time
## 1   1  0.5
## 7   7  0.5
## 10 10  3.0
``````

What is the difference between
`%in%`
and
`==`
here?

The problem is vector recycling.

Your first line does exactly what you'd expect. It checks what elements of `df\$time` are in `c(0.5, 3)` and returns the values which are.

Your second line is trickier. It's actually equivalent to

``````df[df\$time == rep(c(0.5,3), length.out=nrow(df)),]
``````

To see this, let's see what happens if use a vector `rep(0.5, 10)`:

``````rep(0.5, 10) == c(0.5, 3)
[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
``````

See how it returns every odd value. Essentially it's matching 0.5 to the vector `c(0.5, 3, 0.5, 3, 0.5...)`

You can manipulate a vector to produce no matches this way. Take the vector: `rep(c(3, 0.5), 5)`:

``````rep(c(3, 0.5), 5) == c(0.5, 3)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
``````

They're all FALSE. You are matching every 0.5 with 3 and vice versa.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download