Ozzy Ozzy - 3 months ago 9
R Question

How to subset data based on combination of criteria in R

I have a several million rows of data and I need to create a subset. No success despite of trying hard and searching all over the web. The question is:

How to create a subset including only the smallest values of

value
for all
ID
&
item
combinations?


The data structure looks like this:

> df = data.frame(ID = c(1,1,1,1,2,2,2,2),
item = c('A','A','B','B','A','A','B','B'),
value = c(10,5,3,2,7,8,9,10))

> df
ID item value
1 1 A 10
2 1 A 5
3 1 B 3
4 1 B 2
5 2 A 7
6 2 A 8
7 2 B 9
8 2 B 10


The the result should look like this:

ID item value
1 A 5
1 B 2
2 A 7
2 B 9


Any hints greatly appreciated. Thank you!

Answer

We can use aggregate from baseR with grouping variables 'ID' and 'item' to get the min of 'value'

aggregate(value~., df, min)
#  ID item value
#1  1    A     5
#2  2    A     7
#3  1    B     2
#4  2    B     9

Or using dplyr

library(dplyr)
df %>%
   group_by(ID, item) %>%
   summarise(value = min(value))

Or with data.table

library(data.table)
setDT(df)[, .(value = min(value)) , .(ID, item)]

Or another option would be to order and get the first row after grouping

setDT(df)[order(value), head(.SD, 1), .(ID, item)]