Nick Knauer Nick Knauer - 2 months ago 7
R Question

Remove Duplicates by Unique Value in another Column

I have a dataframe that looks like this:

COLA COLB COLC
A nb 1
A nc 0.8
A bc 0.7
A nb 0.7 <------------
B nb 1
B nc 0.3 <------------
B nc 0.8
B aa 0.9


I want to remove the duplicates in COLB by COLA unique ID and keep the maximum value of that duplicate from COLC.

So I want the final result to look like this ( pointed to the rows I want to delete in the previous table):

COLA COLB COLC
A nb 1
A nc 0.8
A bc 0.7
B nb 1
B nc 0.8
B aa 0.9

Answer

We can use dplyr. After arrangeing the 'COLA", and descendingly the 'COLC', we group by 'COLA', 'COLB' and get the first row with slice.

library(dplyr)
df1 %>%
   arrange(COLA, desc(COLC)) %>% 
   group_by(COLA, COLB) %>% 
   slice(1L)