Mohammad Saifullah Mohammad Saifullah - 3 days ago 5
R Question

R: identify duplicate rows and remove the old entry(By Date)

I have a dataframe of the following form:

ID value modified
1 AA 30 2016-11-03
2 AB 40 2016-11-04
3 AC 50 2016-11-05
4 AA 60 2016-11-06
5 AB 20 2016-11-07


I want to identify all the duplicate rows for ID column and remove rows which has comparatively old modification time. So the output will be:

ID value modified
1 AC 50 2016-11-05
2 AA 60 2016-11-06
3 AB 20 2016-11-07


The code I am trying is as follows:

ID<-c('AA','AB','AD','AA','AB')
value<-c(30,40,50,60,20)
modified<-c('2016-11-03','2016-11-04','2016-11-05','2016-11-06','2016-11-07')
df<-data.frame(ID=ID,value=value,modified=modified)
df
ID value modified
1 AA 30 2016-11-03
2 AB 40 2016-11-04
3 AD 50 2016-11-05
4 AA 60 2016-11-06
5 AB 20 2016-11-07

df[!duplicated(df$ID),]
ID value modified
1 AA 30 2016-11-03
2 AB 40 2016-11-04
3 AD 50 2016-11-05


But this is not my desired output, how can I remove the old entries? Thank you in advance for any clue or hints.

Answer

You can use the dplyr package as follows:

library(dplyr)
library(magrittr)

df %<>% group_by(ID) %>% filter(modified==max(modified))

And incase you want the result in a new variable

library(dplyr)

df2 <- df %>% group_by(ID) %>% filter(modified==max(modified))
Comments