Pietro Pietro - 3 months ago 33
R Question

Earliest Date for each id in R

I have a dataset where each individual (id) has an e_date, and since each individual could have more than one e_date, I'm trying to get the earliest date for each individual. So basically I would like to have a dataset with one row per each id showing his earliest e_date value.
I've use the aggregate function to find the minimum values, I've created a new variable combining the date and the id and last I've subset the original dataset based on the one containing the minimums using the new variable created. I've come to this:

new <- aggregate(e_date ~ id, data_full, min)

data_full["comb"] <- NULL
data_full$comb <- paste(data_full$id,data_full$e_date)

new["comb"] <- NULL
new$comb <- paste(new$lopnr,new$EDATUM)

data_fixed <- data_full[which(new$comb %in% data_full$comb),]


The first thing is that the aggregate function doesn't seems to work at all, it reduces the number of rows but viewing the data I can clearly see that some ids appear more than once with different e_date. Plus, the code gives me different results when I use the as.Date format instead of its original format for the date (integer). I think the answer is simple but I'm struck on this one.

Answer

You may use library(sqldf) to get the minimum date as follows:

data1<-data.frame(id=c("789","123","456","123","123","456","789"),
                  e_date=c("2016-05-01","2016-07-02","2016-08-25","2015-12-11","2014-03-01","2015-07-08","2015-12-11"))  

library(sqldf)
data2 = sqldf("SELECT id,
                    min(e_date) as 'earliest_date'
                    FROM data1 GROUP BY 1", method = "name__class")    

head(data2)   

id earliest_date
123 2014-03-01
456 2015-07-08
789 2015-12-11

Comments