Michael Szczepaniak Michael Szczepaniak - 6 months ago 46
R Question

How to pass an anonymous function to dplyr summerise

I have a simple dataframe with 3 columns: name, goal, and actual.
Because this is a simplification of much larger dataframe, I want to use dplyr to compute the number of times a goal has been met by each person.

df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))

enter image description here

The result should look like this:

enter image description here

I should be able to pass an anonymous function similar to what is shown below, but don't have the syntax quite right:

g <- group_by(df, name)
summ <- summarise(g, met_goal = sum((function(x,y) {
})(goal, actual)

When I run the code above, I see 3 of these errors:

Warning messages:
1: In if (x == y) { :
the condition has length > 1 and only the first element will be used

Answer Source

We have equal length vectors in goal and actual, so the relational operators are appropriate to use here. However, when we use them in a simple if() statement we may get unexpected results because if() expects length 1 vectors. Since we have equal length vectors and we require a binary result, taking the sum of the logical vector is the best approach, as follows.

group_by(df, name) %>%
    summarise(met_goal = sum(goal <= actual))
# A tibble: 2 x 2
    name met_goal
  <fctr>    <int>
1   Fred        2
2  Sally        1

The operator is switched to <= because you want 0 for goal > actual and 1 otherwise.

Note that you can use an anonymous function. It was the if() statement that was throwing you off. For example, using

sum((function(x, y) x <= y)(goal, actual)) 

would work in the manner you are asking about.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download