user1923975 user1923975 - 3 months ago 39
R Question

Not fully understanding how SE works across the dplyr verbs

I'm trying to understand how SE works in

dplyr
so I can use variables as inputs to these functions. I'm having some trouble with understanding how this works across the different functions and when I should be doing what. It would be really good to understand the logic behind this.

Here are some examples:

library(dplyr)
library(lazyeval)


a <- c("x", "y", "z")
b <- c(1,2,3)
c <- c(7,8,9)

df <- data.frame(a, b, c)


The following is exactly why i'd use SE and the
*_
variant of a function. I want to change the name of what's being mutated based on another variable.

#Normal mutate - copies b into a column called new
mutate(df, new = b)

#Mutate using a variable column names. Use mutate_ and the unqouted variable name. Doesn't use the name "new", but use the string "col.new"
col.name <- "new"
mutate_(df, col.name = "b")

#Do I need to use interp? Doesn't work
expr <- interp(~(val = b), val = col.name)
mutate_(df, expr)


Now I want to
filter
in the same way. Not sure why my first attempt didn't work.

#Apply the same logic to filter_. the following doesn't return a result
val.to.filter <- "z"
filter_(df, "a" == val.to.filter)

#Do I need to use interp? Works. What's the difference compared to the above?
expr <- interp(~(a == val), val = val.to.filter)
filter_(df, expr)


Now I try to
select_
. Works as expected

#Apply the same logic to select_, an unqouted variable name works fine
col.to.select <- "b"
select_(df, col.to.select)


Now I move on to
rename_
. Knowing what worked for
mutate
and knowing that I had to use
interp
for
filter
, I try the following

#Now let's try to rename. Qouted constant, unqouted variable. Doesn't work
new.name <- "NEW"
rename_(df, "a" = new.name)

#Do I need an eval here? It worked for the filter so it's worth a try. Doesn't work 'Error: All arguments to rename must be named.'
expr <- interp(~(a == val), val = new.name)
rename_(df, expr)


Any tips on best practice when it comes to using variable names across the
dplyr
functions and when
interp
is required would be great.

Answer

The differences here are not related to which dplyr verb you are using. They are related to where you are trying to use the variable. You are mixing whether the variable is used as a function argument or not, and whether it should be interpreted as a name or as a character string.

Scenario 1:

You want to use your variable as an argument name. Such as in your mutate example.

mutate(df, new = b)

Here new is the name of a function argument, it is left of a =. The only way to do this is to use the .dots argument. Like

col.name <- 'new'
mutate_(df, .dots = setNames(list(~b), col.name))

Running just setNames(list(~b), col.name) shows you how we have an expression (~b), which is going right of the =, and the name is going left of the =.

Scenario 2:

You want to give only a variable as a function argument. This is the simplest case. Let's again use mutate(df, new = b), but in this case we want b to be variable. We could use:

v <- 'b'
mutate_(df, .dots = setNames(list(v), 'new'))

Or simply:

mutate_(df, new = b)

Scenario 3

You want to do some combinations of variable and fixed things. That is, your expression should only be partly variable. For this we use interp. For example, what if we would like to do something like:

mutate(df, new = b + 1)

But being able to change b?

v <- 'b'    
mutate_(df, new = interp(~var + 1, var = as.name(v)))

Note that we as.name to make sure that we insert b into the expression, not 'b'.