Kou - 4 months ago 13

R Question

I have a data frame, say

`df <- data.frame(x = c(1,2,5,6,3,3,3,6,8,8,8,8),`

y = c(1,1,1,1,1,2,3,1,1,2,3,4),

z = c("a","b","c","d","e","f","g","h","i","j","k","l"))

it looks like this

`x y z`

1 1 1 a

2 2 1 b

3 5 1 c

4 6 1 d

5 3 1 e

6 3 2 f

7 3 3 g

8 6 1 h

9 8 1 i

10 8 2 j

11 8 3 k

12 8 4 l

I would like pick unique elements from column x, based on column y such that y should be maximum (in this case say for row number 5 to 7 are 3'3, I would like to pick the x = 3 corresponding to y = 3 (maximum value) similarly for x = 8 I d like to pick y = 4 row )

the output should look like this

`x y z`

1 1 1 a

2 2 1 b

3 5 1 c

4 6 1 d

5 3 3 g

6 6 1 h

7 8 4 l

I have a solution for that, which I am posting in the solution, but if there is there any better method to achieve this, My solution only works in this specific case (picking the largest) what is the general case solution for this?

Answer

One solution using `dplyr`

```
library(dplyr)
df %>%
group_by(x) %>%
slice(max(y))
# x y z
# (dbl) (dbl) (chr)
#1 1 1 a
#2 2 1 b
#3 3 3 g
#4 5 1 c
#5 6 1 d
#6 8 4 l
```

The `base R`

alternative is using `aggregate`

```
aggregate(y~x, df, max)
```