David David - 1 year ago 102
R Question

R dpylr select_if with multiple conditions

I would like to select all numeric variables as well as some variables by name. I have managed to use select_if to get the numeric variables and select to get the ones by name but can't combine the two into one statement

x = data.table(c(1,2,3),c(10,11,12),c('a','b','c'),c('x','y','z'), c('l', 'm','n'))

I want my result to be:

V1 V2 V4 V5
1 10 x l
2 11 y m
3 12 z n

I tried this but it doesn't work

y = x %>%
select_if(is.numeric, V4, V5)

Answer Source

If we have a data frame, x:

x = data.frame(V1=c(1,2,3),V2=c(10,11,12),V3=c('a','b','c'),V4=c('x','y','z'),V5=c('l', 'm','n'), stringsAsFactors=FALSE)
##  V1 V2 V3 V4 V5
##1  1 10  a  x  l
##2  2 11  b  y  m
##3  3 12  c  z  n

where V1 and V2 are actually numeric and the rest of the columns are not factors, then we can do:

y <- x %>% select_if(function(col) is.numeric(col) | 
                                   all(col == .$V4) | 
                                   all(col == .$V5))
##  V1 V2 V4 V5
##1  1 10  x  l
##2  2 11  y  m
##3  3 12  z  n

Not saying that this is the best thing to do, but it does do what you want. The issue here is that select_if expects its function to return a boolean vector corresponding to all columns.

Another way is to use select:

y <- x %>% select(which(sapply(.,class)=="numeric"),V4,V5)
##  V1 V2 V4 V5
##1  1 10  x  l
##2  2 11  y  m
##3  3 12  z  n

which is probably better.