tesseracT tesseracT - 2 months ago 6
R Question

Adding a random number of days to dates via some function

I am pretty new to R, and I have been working on this problem for a while now.
My data contains a column of order dates. It also has a column of delivery dates. Some of the delivery dates are a date (12/31/1990) that occurred before the order date, which is causing problems in calculating average shipping time. I would like to take the order date for these rows and add a random number of days from a uniform distribution.

First, I tried to write a function that I could apply to the data, but the result was not what I wanted. What I want is for the simulated delivery date to end up in the delivery date column. Although I can't provide my data, my code is here:

Note that typeof(x) returns "list". Also, my data consists of 147999 rows and 13 columns. For example, x[1,1] = 2013-01-01 and x[1,2] = 1990-31-12.

func1 = function(x){
if(x[2]=="1990-12-31" && !is.na(x[2]))
x[2] = as.Date(x[1]) + floor(runif(1,min=0,max=30))
return (x)
}


Example data:

x <- structure(list(orderDate = structure(c(15706, 15706, 15706, 15706,
15706), class = "Date"), deliveryDate = structure(c(15707, 15707,
7669, 15707, 7669), class = "Date")), .Names = c("orderDate",
"deliveryDate"), row.names = c(NA, 5L), class = "data.frame")

# orderDate deliveryDate
#1 2013-01-01 2013-01-02
#2 2013-01-01 2013-01-02
#3 2013-01-01 1990-12-31
#4 2013-01-01 2013-01-02
#5 2013-01-01 1990-12-31

Answer

If I did not get it wrong, x is a data frame with 2 columns. A vectorized if implementation can be achieved via ifelse:

x[[2]] <- structure(ifelse(x[[2]] == "1990-12-31" & !is.na(x[[2]]),
                           as.Date(x[[1]]) + sample(0:30, 1),
                           x[[2]]),
                    class = "Date")

Or a faster replacement:

ind <- x[[2]] == "1990-12-31" & !is.na(x[[2]])
x[ind, 2] <- as.Date(x[ind, 1]) + sample(0:30, sum(ind), replace = TRUE)

With your example dataset and the same random seed 0, both options give the same result:

#   orderDate deliveryDate
#1 2013-01-01   2013-01-02
#2 2013-01-01   2013-01-02
#3 2013-01-01   2013-01-28
#4 2013-01-01   2013-01-02
#5 2013-01-01   2013-01-28

In the first case, ifelse alone is returning integers (the internal representation of "Date"), hence we need to give "Date" class to it to make it a "Date".