Felix S - 1 year ago 118
R Question

# dplyr filter: Get rows with minimum of variable, but only the first if multiple minima

I want to make a grouped filter using

`dplyr`
, in a way that within each group only that row is returned which has the minimum value of variable
`x`
.

My problem is: As expected, in the case of multiple minima all rows with the minimum value are returned. But in my case, I only want the first row if multiple minima are present.

Here's an example:

``````df <- data.frame(
A=c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
x=c(1, 1, 2, 2, 3, 4, 5, 5, 5),
y=rnorm(9)
)

library(dplyr)
df.g <- group_by(df, A)
filter(df.g, x == min(x))
``````

As expected, all minima are returned:

``````Source: local data frame [6 x 3]
Groups: A

A x           y
1 A 1 -1.04584335
2 A 1  0.97949399
3 B 2  0.79600971
4 C 5 -0.08655151
5 C 5  0.16649962
6 C 5 -0.05948012
``````

With ddply, I would have approach the task that way:

``````library(plyr)
ddply(df, .(A), function(z) {
z[z\$x == min(z\$x), ][1, ]
})
``````

... which works:

``````  A x           y
1 A 1 -1.04584335
2 B 2  0.79600971
3 C 5 -0.08655151
``````

Q: Is there a way to approach this in dplyr? (For speed reasons)

Answer Source

Just for completeness: Here's the final `dplyr` solution, derived from the comments of @hadley and @Arun:

``````library(dplyr)
df.g <- group_by(df, A)
filter(df.g, rank(x, ties.method="first")==1)
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download