Ed Swindelles - 6 months ago 52

R Question

I wish to (1) group data by one variable (

`State`

`Employees`

(1) and (2) are easy one-liners, and I feel like (3) should be too, but I can't get it.

Here is a sample data set:

`> data`

State Company Employees

1 AK A 82

2 AK B 104

3 AK C 37

4 AK D 24

5 RI E 19

6 RI F 118

7 RI G 88

8 RI H 42

data <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,

2L), .Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A",

"B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L,

104L, 37L, 24L, 19L, 118L, 88L, 42L)), .Names = c("State", "Company",

"Employees"), class = "data.frame", row.names = c(NA, -8L))

Calculate

`min`

`aggregate`

`> aggregate(Employees ~ State, data, function(x) min(x))`

State Employees

1 AK 24

2 RI 19

...or

`data.table`

`> library(data.table)`

> DT <- data.table(data)

> DT[ , list(Employees = min(Employees)), by = State]

State Employees

1: AK 24

2: RI 19

But how do I extract the entire row corresponding to these

`min`

`Company`

Answer

Slightly more elegant:

```
library(data.table)
DT[ , .SD[which.min(Employees)], by = State]
State Company Employees
1: AK D 24
2: RI E 19
```

Slighly less elegant than using `.SD`

, but a bit faster (for data with many groups):

```
DT[DT[ , .I[which.min(Employees)], by = State]$V1]
```

Also, just replace the expression `which.min(Employees)`

with `Employees == min(Employees)`

, if your data set has multiple identical min values and you'd like to subset all of them.

See also Subset by group with data.table.