Positive and negative subsetting using dplyr::contains() and dplyr::select() in R

I'm trying to achieve positive subsetting specifically using a combination of

and dplyr::contains()`, with the goal being to subset by multiple string matches.

Minimal working example: when starting off with
and doing negative subsetting, I generate
as expected. In contrast, when attempting positive subsetting of
, I generate
(no columns) when I'd have expected something like
. Thanks for any help.

df1 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"), "hours"=c(4,6,4), "distance"=c(23,65,21))
df2 <- df1 %>% select(-contains("ppt_")) %>% select(-contains("het_")) %>% select(-contains("orm_"))
df3 <- df1 %>% select(contains("ppt_")) %>% select(contains("het_")) %>% select(contains("orm_"))
df4 <- data.frame("ppt_paint"=c(45,98,23),"het_heating"=c(1,1,2) ,"orm_wood"=c("QQ","OA","BB"))

Answer Source

Think (and have a look to the resulting data.frame) to what happens after: df1 %>% select(contains("ppt_")). As asked, it only retains the only column that contains "ppt_". Further expressions cannot work as you expect since other columns, no matter what you're feeding select with, are "no longer" there.

You can keep the same idea but combine in the same select you three keys:

df1 %>% select(matches("ppt_"), matches("het_"), matches("orm_"))
  ppt_paint het_heating orm_wood
1        45           1       QQ
2        98           1       OA
3        23           2       BB

Alternatively, you can do it with matches, that accepts regular expressions:

df1 %>% select(matches(c("ppt_|het_|orm_")))
  ppt_paint het_heating orm_wood
1        45           1       QQ
2        98           1       OA
3        23           2       BB

And by the way you can also use it to shorten your "negative" indexing:

df1 %>% select(-matches("ppt_|het_|orm_"))
  hours distance
1     4       23
2     6       65
3     4       21
