AbsoluteBeginner - 1 year ago 65
R Question

# Calculate and output the date of customer's first order

Data:

``````DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),
orderDate = c("1.1.12","1.1.12","1.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99))
``````

Expected outcome:

``````DB <- data.frame(orderID = c(1,2,3,4,4,5,6,6,7,8),
orderDate = c("1.1.12","2.1.12","3.1.12","13.1.12","13.1.12","12.1.12","10.1.12","10.1.12","21.1.12","24.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99),
DateOfFirstOrderofCustomer = c("1.1.12", "2.1.12", "3.1.12", "1.1.12", "1.1.12", "3.1.12", "2.1.12", "2.1.12", "1.1.12", "1.1.12"))
``````

For Understanding:

The
`orderI`
D is continuous. Products orderd from the same
`customerID`
at the same day get the same
`orderID`
. When the same customer orders products at another day he/she itÂ´s a new
`orderID`
.

I want to add an additional column for every row/entry which contains the date of the customer's first order (e.g. customer 1 (
`customerID`
1) made his first order on
`1.1.12`
so this date is entered in all orders from this customer). How can we do this?

The original data has about 500k rows: so plz give a solution which needs only little perfomance.

With just base R functions:

``````# convert the date column to date-format
DB\$orderDate <- as.Date(DB\$orderDate, format('%d.%m.%y'))

# get the first date for each customer
DB\$DateFirstOrder <- with(DB, ave(orderDate, customerID, FUN = min))
``````

the result is then (using the data of Mike Spencer):

``````> DB
orderID  orderDate itemID customerID itemPrice DateFirstOrder
1        1 2012-01-01      2          1      9.99     2012-01-01
2        2 2012-01-04      3          2     14.99     2012-01-04
3        3 2012-01-06      2          3      9.99     2012-01-06
4        4 2012-01-13      5          1     19.99     2012-01-01
5        4 2012-01-13     12          1     29.99     2012-01-01
6        5 2012-01-12      4          3      4.99     2012-01-06
7        6 2012-01-10      2          2      9.99     2012-01-04
8        6 2012-01-10      3          2     14.99     2012-01-04
9        7 2012-01-21      1          1     49.99     2012-01-01
10       8 2012-01-24      5          1     19.99     2012-01-01
``````

For the fastest solution, I would recommend the data.table package. To get the desired result with this package, you need to do:

``````library(data.table)
setDT(DB)[, orderDate := as.Date(orderDate, format('%d.%m.%y'))
][, DateFirstOrder := min(orderDate), by = customerID]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download