jormaga jormaga - 1 month ago 6
R Question

Using Apply instead of for (using 2 columns of a data frame)

I have a data frame like this:

Letters Dates
A 22/03/2015
A 22/03/2015
A 23/03/2015
B 22/03/2015
B 23/03/2015
C 23/03/2015
C 23/03/2015


I'd like to create another column (Dates2) which assigns at each row the minimum date of all the rows with that letter. This is the result:

Letters Dates Dates2
A 22/03/2015 22/03/2015
A 22/03/2015 22/03/2015
A 23/03/2015 22/03/2015
B 22/03/2015 22/03/2015
B 23/03/2015 22/03/2015
C 23/03/2015 23/03/2015
C 23/03/2015 23/03/2015


I wrote the following code with a for loop, but I'd like to do it more efficiently (working with vectors instead of loops). How can I do it with Apply / other solutions?

rm(list=ls())

data <- data.frame(rbind(c("A", "22/03/2015"),
c("A", "22/03/2015"),
c("A", "23/03/2015"),
c("B", "22/03/2015"),
c("B", "23/03/2015"),
c("C", "23/03/2015"),
c("C", "23/03/2015")
), stringsAsFactors=FALSE)

colnames(data) <- c("Letters", "Dates")

for (i in 1:nrow(data))
{
thisLetter = data$Letters[i]
temp = subset(data$Dates, data$Letters == thisLetter)
min_date = min(as.Date(temp, "%d/%m/%Y"))
data$Dates2[i] = format(min_date, "%d/%m/%Y")
}


Thank you very much!

Answer

We can use data.table. Convert the 'data.frame' to 'data.table', grouped by 'Letters', order the rows based on 'Dates' after converting to Date class, get the first element of 'Dates' (head(Dates, 1)) and assign (:=) it to create the 'Dates2' column.

library(data.table)
setDT(data)[order(as.Date(Dates)), Dates2 := head(Dates,1), by = Letters]

Or dplyr

library(dplyr)
data %>% 
     group_by(Letters) %>%
     arrange(as.Date(Dates)) %>%
     mutate(Date2 = first(Dates))