phill phill - 3 months ago 7
R Question

Filter character vector based on first two elements

I have a vector that look like this:

data <- c("0115", "0159", "0256", "0211")


I want to filter the data based on the first 2 elements of my vector. For example:

group 1 - elements that start with "01"

group 2 - elements that start with "02"

Any idea how to accomplish this?

Answer

You might want to use Regular Expression (regex) to find strings that start with "01" or "02".

Base approach is use grep(), which returns indices of strings that match a pattern. Here's an example - notice I've changed the 2nd and 4th data elements to demonstrate how just searching for "01" or "02" will lead to incorrect answer:

d <- c("0115", "0102", "0256", "0201")

grep("01", d)
#> [1] 1 2 4

d[grep("01", d)]
#> [1] "0115" "0102" "0201"

Because this searches for "01" anywhere, you get "0201" in the mix. To avoid, add "^" to the pattern to specify that the string starts with "01":

grep("^01", d)
#> [1] 1 2

d[grep("^01", d)]
#> [1] "0115" "0102"

If you use the stringr package, you can also use str_detect() in the same way:

library(stringr)

d[str_detect(d, "^01")]
#> [1] "0115" "0102"