Dataminer Dataminer - 25 days ago 19
R Question

Creating a User-Item Matrix for Collaborative Filtering

I am attempting to run a Collaborative Filtering (CF) algorithm on a "User-Item-Rating" data. My data is in a long format i.e. each row has data for a User rating a specific item. I need to convert this into a "User-Item" matrix before I can apply a CF algorithm on it.

I am using the

spread
function from the
tidyr
package for this task. But given that I have more than 50k unique items, the resulting dataframe would be huge. R is unable to execute this (on my local machine) and throws up the "cannot allocate vector of size" error.

What's the best way to deal with this? Some of the options I tried exploring, but was unable to get them to work:


  • I was thinking if there is a way to return the output of spread call as a Sparse Matrix

  • I also tried exploring if packages which implements CF such as
    recommenderlab
    has an option to deal with this. But I could not see any option for that.



Any help will be greatly appreciated.

Thanks!

Answer

As you (probably) got sparse data, go with a sparse matrix. Here's an example for 50000 sparse example ratings:

library(stringi)
library(Matrix)
set.seed(1)
df <- data.frame(item = stri_rand_strings(50000, 4))
df$user <- as.factor(1:nrow(df))
df$rating <- sample(1:10, nrow(df), T)
m <- sparseMatrix(
  i = as.integer(df$user), 
  j = as.integer(df$item), 
  x = df$rating, 
  dimnames = list(levels(df$user), levels(df$item))
)