Kou - 1 year ago 67
R Question

# Picking up only specific columns based on conditions on multiple columns in R

I have a data frame, say

``````df <- data.frame(x = c(1,2,5,6,3,3,3,6,8,8,8,8),
y = c(1,1,1,1,1,2,3,1,1,2,3,4),
z = c("a","b","c","d","e","f","g","h","i","j","k","l"))
``````

it looks like this

``````   x y z
1  1 1 a
2  2 1 b
3  5 1 c
4  6 1 d
5  3 1 e
6  3 2 f
7  3 3 g
8  6 1 h
9  8 1 i
10 8 2 j
11 8 3 k
12 8 4 l
``````

I would like pick unique elements from column x, based on column y such that y should be maximum (in this case say for row number 5 to 7 are 3'3, I would like to pick the x = 3 corresponding to y = 3 (maximum value) similarly for x = 8 I d like to pick y = 4 row )

the output should look like this

``````  x y z
1 1 1 a
2 2 1 b
3 5 1 c
4 6 1 d
5 3 3 g
6 6 1 h
7 8 4 l
``````

I have a solution for that, which I am posting in the solution, but if there is there any better method to achieve this, My solution only works in this specific case (picking the largest) what is the general case solution for this?

One solution using `dplyr`

``````library(dplyr)
df %>%
group_by(x) %>%
slice(max(y))

#      x     y     z
#  (dbl) (dbl) (chr)
#1     1     1     a
#2     2     1     b
#3     3     3     g
#4     5     1     c
#5     6     1     d
#6     8     4     l
``````

The `base R` alternative is using `aggregate`

``````aggregate(y~x, df, max)
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download