xineers xineers - 1 year ago 140
R Question

R arules preparing dataset for transactions

I prepared a data set for reading it as transactions using arules package in R. however, one of my data pre-processing is causing an issue when I use the command itemFrequencyplot, specifically, the highest frequency item is " ". Would anyone have any suggestions to resolve this issue?

Original data:

data <-, nrow = 10, ncol = 3))
colnames(data) <- c("Customer", "OrderDate", "Product")
data$Customer <- c("John", "John", "John", "Tom", "Tom", "Tom", "Sally", "Sally", "Sally", "Sally")
data$OrderDate <- c("1-Oct", "2-Oct", "2-Oct", "2-Oct","2-Oct", "2-Oct", "3-Oct", "3-Oct", "3-Oct", "3-Oct")
data$Product <- c("Milk", "Eggs", "Bread", "Butter", "Eggs", "Milk", "Bread", "Butter", "Eggs", "Wine")

I make the following transformation


newdata <- data %>%
group_by(Customer, OrderDate) %>%
mutate(ProductValue = paste0("Product", 1:n()) ) %>%
dcast(Customer + OrderDate ~ ProductValue, value.var = "Product") %>%

newdata[] <- " "
newdata <- newdata[ , 3:6]
newdata[sapply(newdata, is.character)] <- lapply(newdata[sapply(newdata, is.character)], as.factor) #converting is.character columns into as.factor

used write.table to create csv file without column names for reading via arules

write.table(newdata, "transactions.csv", row.names = FALSE, col.names = FALSE, sep = ",")

using arules package to read the csv file as transactions


transactiondata <- read.transactions("transactions.csv", sep = ",", format = "basket")

does not work - throws an error and after reading previous queries on stackoverflow, I was able to resolve it as follows

transactiondata <- read.transactions("transactions.csv", sep = ",", format = "basket", rm.duplicates = TRUE)

itemFrequencyPlot(transactiondata, topN = 5)

the result of this plot has " " as the top frequency item, which in reality is not the case and is a result of my data pre-processing. Suggestions to resolve it would be greatly appreciated!

Answer Source

I would do it this way (following the examples in the manual page for transactions):

data_list <- split(data$Product, paste(data$OrderDate, data$Customer))
trans <- as(data_list, "transactions")

    items                    transactionID
[1] {Milk}                   1-Oct John   
[2] {Bread,Eggs}             2-Oct John   
[3] {Butter,Eggs,Milk}       2-Oct Tom    
[4] {Bread,Butter,Eggs,Wine} 3-Oct Sally

itemFrequencyPlot(trans, topN = 5)

Hope this helps!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download