Ed Swindelles - 1 year ago 97
R Question

Extract row corresponding to minimum value of a variable by group

I wish to (1) group data by one variable (

`State`
), (2) within each group find the row of minimum value of another variable (
`Employees`
), and (3) extract the entire row.

(1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it.

Here is a sample data set:

``````> data
State Company Employees
1    AK       A        82
2    AK       B       104
3    AK       C        37
4    AK       D        24
5    RI       E        19
6    RI       F       118
7    RI       G        88
8    RI       H        42

data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A",
"B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L,
104L, 37L, 24L, 19L, 118L, 88L, 42L)), .Names = c("State", "Company",
"Employees"), class = "data.frame", row.names = c(NA, -8L))
``````

Calculate
`min`
by group is easy, using
`aggregate`
:

``````> aggregate(Employees ~ State, data, function(x) min(x))
State Employees
1    AK        24
2    RI        19
``````

...or
`data.table`
:

``````> library(data.table)
> DT <- data.table(data)
> DT[ , list(Employees = min(Employees)), by = State]
State Employees
1:    AK        24
2:    RI        19
``````

But how do I extract the entire row corresponding to these
`min`
values, i.e. also including
`Company`
in the result?

Slightly more elegant:

``````library(data.table)
DT[ , .SD[which.min(Employees)], by = State]

State Company Employees
1:    AK       D        24
2:    RI       E        19
``````

Slighly less elegant than using `.SD`, but a bit faster (for data with many groups):

``````DT[DT[ , .I[which.min(Employees)], by = State]\$V1]
``````

Also, just replace the expression `which.min(Employees)` with `Employees == min(Employees)`, if your data set has multiple identical min values and you'd like to subset all of them.