user3292755 user3292755 - 2 months ago 11
R Question

Create date column in R

i obtained data in RDBMS with SQL and want to forecast the daily purchase using R.

Here is the first 12 rows of the data .
first 12 data

What i want to make is store the dataframe like in the image below, and in the end i will try to write function to Forecast it by each item title which is in the rows using exponential smoothing.
Purpose of dataframe

So far, i have succesfully done the title column. But i cannot make multiple date column exactly like the 2nd image above. Here is the code so far :

df1 <- data.frame()
dailydate <- as.Date(as.POSIXct(data$date_placed))
newdate <- unique(dailydate)
itemtitle <- as.character(data$title)
newitemtitle <- unique(itemtitle)
df1 <- data.frame(newitemtitle,t(dailydate))
Error in data.frame(newitemtitle, t(dailydate))


I cannot add new column into
df1
,and also not yet find the way to match the daily quantity based on the title. I am open to any suggestion with this problem

Answer

This is a good place to use the reshape2 package.

df1 <- structure(list(title = structure(c(5L, 3L, 6L, 1L, 7L, 2L, 1L, 
4L, 8L, 3L), .Label = c("d", "k", "m", "n", "q", "t", "u", "v"
), class = "factor"), quantity = c(4L, 3L, 5L, 10L, 6L, 13L, 
4L, 6L, 12L, 1L), date_placed = structure(c(1L, 1L, 1L, 2L, 2L, 
3L, 3L, 4L, 5L, 5L), .Label = c("8/24/2013", "8/25/2013", "8/26/2013", 
"8/27/2013", "8/28/2013"), class = "factor")), .Names = c("title", 
"quantity", "date_placed"), row.names = c(NA, -10L), class = "data.frame")

#install.packages("reshape2")
reshape2:::dcast(df1, title ~ date_placed, value.var = "quantity", fill = 0)

Result:

#  title 8/24/2013 8/25/2013 8/26/2013 8/27/2013 8/28/2013
#1     d         0        10         4         0         0
#2     k         0         0        13         0         0
#3     m         3         0         0         0         1
#4     n         0         0         0         6         0
#5     q         4         0         0         0         0
#6     t         5         0         0         0         0
#7     u         0         6         0         0         0
#8     v         0         0         0         0        12

The benefit of this over the other answer is that the output is a data.frame that can now be manipulated as you wish, instead of a table.