Konrad Konrad - 13 hours ago 2
R Question

Execute dplyr operation only if column exists

Drawing on the discussion on conditional dplyr evaluation I would like conditionally execute a step in pipeline depending on whether the reference column exists in the passed data frame.

Example



The results generated by
1)
and
2)
should be identical.


Existing column



# 1)
mtcars %>%
filter(am == 1) %>%
filter(cyl == 4)

# 2)
mtcars %>%
filter(am == 1) %>%
{
if("cyl" %in% names(.)) filter(cyl == 4) else .
}


Unavailable column



# 1)
mtcars %>%
filter(am == 1)

# 2)
mtcars %>%
filter(am == 1) %>%
{
if("absent_column" %in% names(.)) filter(absent_column == 4) else .
}


Problem



For the available column the passed object does not correspond to the initial data frame. The original code returns the error message:


Error in
filter(cyl == 4)
: object
'cyl'
not found


I have tried alternative syntax (with no luck):

>> mtcars %>%
... filter(am == 1) %>%
... {
... if("cyl" %in% names(.)) filter(.$cyl == 4) else .
... }
Show Traceback

Rerun with Debug
Error in UseMethod("filter_") :
no applicable method for 'filter_' applied to an object of class "logical"

Answer

Because of the way the scopes here work, you cannot access the dataframe from within your if statement. Fortunately, you don't need to.

Try:

mtcars %>%
  filter(am == 1) %>%
  filter({if("cyl" %in% names(.)) cyl else NULL} == 4)

Here you can use the '.' object within the conditional so you can check if the column exists and, if it exists, you can return the column to the filter function.

EDIT: as per docendo discimus' comment on the question, you can access the dataframe but not implicitly - i.e. you have to specifically reference it with .