Tom Dale - 1 year ago 83
R Question

Subset R data frame based on string matches in two columns

I have a data frame with three columns and thousands of rows. The first two columns (x and y) contain character strings, and the third (z) contains numeric data. I need to subset the data frame based on matching values in both of the first two columns.

``````    x <- c("a", "b", "c", "d", "f", "g", "h", "i", "j", "k")
y <- c("h", "b", "k", "a", "g", "d", "i", "c", "f", "j")
z <- c(1:10)
df <- data.frame(x, y, z)

x y  z
1  a h  1
2  b b  2
3  c k  3
4  d a  4
5  f g  5
6  g d  6
7  h i  7
8  i c  8
9  j f  9
10 k j 10
``````

Say this is my table, and the values I am interested in are "a", "c", "f", "h" and "k". I only want to return the rows in which both x and y contain one of the five, so in this case rows 1 and 3.

I've tried:

``````    df2 <- filter(df,
x == ("a" | "c" | "f" | "h" | "k") &
y == ("a" | "c" | "f" | "h" | "k"))
``````

but this doesn't work for factors or character strings. Is there an equivalent or another way around this?

I think this returns what you are looking for:

``````# build vector of necessary elements
mustHaves <- c("a", "c", "f", "h", "k")
# perform subsetting
df[with(df, x %in% mustHaves & y %in% mustHaves),]
x y z
1 a h 1
3 c k 3
``````

data

``````df <- data.frame(x, y, z, stringsAsFactors = FALSE)
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download