AbsoluteBeginner AbsoluteBeginner - 1 month ago 7
R Question

Calculate and output the date of customer's first order

Data:

DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99))


Expected outcome:

DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),
orderDate = c("1.1.12","2.1.12","3.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99),
DateOfFirstOrderofCustomer = c("1.1.12", "2.1.12", "3.1.12", "1.1.12", "1.1.12", "3.1.12", "2.1.12", "2.1.12", "1.1.12", "1.1.12"))


For Understanding:

The
orderI
D is continuous. Products orderd from the same
customerID
at the same day get the same
orderID
. When the same customer orders products at another day he/she it´s a new
orderID
.

I want to add an additional column for every row/entry which contains the date of the customer's first order (e.g. customer 1 (
customerID
1) made his first order on
1.1.12
so this date is entered in all orders from this customer). How can we do this?

The original data has about 500k rows: so plz give a solution which needs only little perfomance.

Answer

With just base R functions:

# convert the date column to date-format
DB$orderDate <- as.Date(DB$orderDate, format('%d.%m.%y'))

# get the first date for each customer
DB$DateFirstOrder <- with(DB, ave(orderDate, customerID, FUN = min))

the result is then (using the data of Mike Spencer):

> DB
   orderID  orderDate itemID customerID itemPrice DateFirstOrder
1        1 2012-01-01      2          1      9.99     2012-01-01
2        2 2012-01-04      3          2     14.99     2012-01-04
3        3 2012-01-06      2          3      9.99     2012-01-06
4        4 2012-01-13      5          1     19.99     2012-01-01
5        4 2012-01-13     12          1     29.99     2012-01-01
6        5 2012-01-12      4          3      4.99     2012-01-06
7        6 2012-01-10      2          2      9.99     2012-01-04
8        6 2012-01-10      3          2     14.99     2012-01-04
9        7 2012-01-21      1          1     49.99     2012-01-01
10       8 2012-01-24      5          1     19.99     2012-01-01

For the fastest solution, I would recommend the data.table package. To get the desired result with this package, you need to do:

library(data.table)
setDT(DB)[, orderDate := as.Date(orderDate, format('%d.%m.%y'))
          ][, DateFirstOrder := min(orderDate), by = customerID]
Comments