Rahul Rahul - 1 month ago 7x
R Question

Pass a string argument to a function as dataframe column name in dplyr

I am trying to pass a string variable to a function, to be used as the column name after some data alteration.

Here is the function:

cleandata <- function(df,name){
df <- df %>%
gather(key = 'Year',value = name,X1960:X2015)
df <- df %>%
df$Year <- substr(df$Year,start = 2,stop = 5)
df$Year <- as.factor(df$Year)

I want to pass a string variable to 'name', and have it as the column name.

The current output of the function is:

> cleandata(lifeexp,'LifeExp')
Source: local data frame [13,888 x 4]

Country.Name Country.Code Year name
(fctr) (fctr) (fctr) (dbl)
1 Aruba ABW 1960 65.56937
2 Andorra AND 1960 NA
3 Afghanistan AFG 1960 32.32851
4 Angola AGO 1960 32.98483
5 Albania ALB 1960 62.25437
6 Arab World ARB 1960 46.84706
7 United Arab Emirates ARE 1960 52.24322
8 Argentina ARG 1960 65.21554
9 Armenia ARM 1960 65.86346
10 American Samoa ASM 1960 NA
.. ... ... ... ...

The last column should be 'LifeExp', not name. What am I missing?

Thanks in advance,



You want to use gather_ here. See vignette('nse') for an explanation why.

year_cols <- names(df)[grepl('^X\\d{4}$', names(df))]
df %>% gather_('Year', name, year_cols)

The issue is gather takes an unquoted name for its key and value columns, so you can't pass in a variable name. It's just going to interpret what ever variable name you put in there as the the unquoted name you want for the value column. This is consistent with the principle that the tidyr functions without underscores are meant for interactive use and those with underscores should be used when your effort is more programmatic.