Ozzy - 1 year ago 66
R Question

# How to subset data based on combination of criteria in R

I have a several million rows of data and I need to create a subset. No success despite of trying hard and searching all over the web. The question is:

How to create a subset including only the smallest values of

`value`
for all
`ID`
&
`item`
combinations?

The data structure looks like this:

``````> df = data.frame(ID = c(1,1,1,1,2,2,2,2),
item = c('A','A','B','B','A','A','B','B'),
value = c(10,5,3,2,7,8,9,10))

> df
ID item value
1  1    A    10
2  1    A     5
3  1    B     3
4  1    B     2
5  2    A     7
6  2    A     8
7  2    B     9
8  2    B    10
``````

The the result should look like this:

``````  ID item value
1    A     5
1    B     2
2    A     7
2    B     9
``````

Any hints greatly appreciated. Thank you!

We can use `aggregate` from `baseR` with grouping variables 'ID' and 'item' to get the `min` of 'value'

``````aggregate(value~., df, min)
#  ID item value
#1  1    A     5
#2  2    A     7
#3  1    B     2
#4  2    B     9
``````

Or using `dplyr`

``````library(dplyr)
df %>%
group_by(ID, item) %>%
summarise(value = min(value))
``````

Or with `data.table`

``````library(data.table)
setDT(df)[, .(value = min(value)) , .(ID, item)]
``````

Or another option would be to `order` and get the `first` row after grouping

``````setDT(df)[order(value), head(.SD, 1), .(ID, item)]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download