Tom Dale - 1 month ago 5x

R Question

I have a data frame with three columns and thousands of rows. The first two columns (x and y) contain character strings, and the third (z) contains numeric data. I need to subset the data frame based on matching values in both of the first two columns.

`x <- c("a", "b", "c", "d", "f", "g", "h", "i", "j", "k")`

y <- c("h", "b", "k", "a", "g", "d", "i", "c", "f", "j")

z <- c(1:10)

df <- data.frame(x, y, z)

x y z

1 a h 1

2 b b 2

3 c k 3

4 d a 4

5 f g 5

6 g d 6

7 h i 7

8 i c 8

9 j f 9

10 k j 10

Say this is my table, and the values I am interested in are "a", "c", "f", "h" and "k". I only want to return the rows in which both x and y contain one of the five, so in this case rows 1 and 3.

I've tried:

`df2 <- filter(df,`

x == ("a" | "c" | "f" | "h" | "k") &

y == ("a" | "c" | "f" | "h" | "k"))

but this doesn't work for factors or character strings. Is there an equivalent or another way around this?

Thanks in advance.

Answer

I think this returns what you are looking for:

```
# build vector of necessary elements
mustHaves <- c("a", "c", "f", "h", "k")
# perform subsetting
df[with(df, x %in% mustHaves & y %in% mustHaves),]
x y z
1 a h 1
3 c k 3
```

**data**

```
df <- data.frame(x, y, z, stringsAsFactors = FALSE)
```

Source (Stackoverflow)

Comments