Kou Kou - 3 months ago 6
R Question

Picking up only specific columns based on conditions on multiple columns in R

I have a data frame, say

df <- data.frame(x = c(1,2,5,6,3,3,3,6,8,8,8,8),
y = c(1,1,1,1,1,2,3,1,1,2,3,4),
z = c("a","b","c","d","e","f","g","h","i","j","k","l"))


it looks like this

x y z
1 1 1 a
2 2 1 b
3 5 1 c
4 6 1 d
5 3 1 e
6 3 2 f
7 3 3 g
8 6 1 h
9 8 1 i
10 8 2 j
11 8 3 k
12 8 4 l


I would like pick unique elements from column x, based on column y such that y should be maximum (in this case say for row number 5 to 7 are 3'3, I would like to pick the x = 3 corresponding to y = 3 (maximum value) similarly for x = 8 I d like to pick y = 4 row )

the output should look like this

x y z
1 1 1 a
2 2 1 b
3 5 1 c
4 6 1 d
5 3 3 g
6 6 1 h
7 8 4 l


I have a solution for that, which I am posting in the solution, but if there is there any better method to achieve this, My solution only works in this specific case (picking the largest) what is the general case solution for this?

Answer

One solution using dplyr

library(dplyr) 
df %>% 
 group_by(x) %>% 
 slice(max(y))

#      x     y     z
#  (dbl) (dbl) (chr)
#1     1     1     a
#2     2     1     b
#3     3     3     g
#4     5     1     c
#5     6     1     d
#6     8     4     l

The base R alternative is using aggregate

aggregate(y~x, df, max)
Comments