I found a couple of post where users are wondering why they are receiving nan values in their predictions when using ALS. I ran into the same problem and seemingly found the answer and an implemented solution, with detailed discussion in the docs:
Note: there was a working link here to documentation on coldStartStrategy() however seemingly due to my question the documentation was removed.
Which I thought would solve the problem. Except even after updating to Spark 2.1.1 (wasn't working on 2.1.0) I am continuing to receive the same error:
TypeError: init() got an unexpected keyword argument 'coldStartStrategy'
Here is where I attempt to use the argument:
full_train, full_test = ugr_df.randomSplit([0.7, 0.3], seed=0L)
als = ALS(rank = rank, maxIter = maxIter, regParam = lmbda,
userCol = "user_id", itemCol="game_id", seed = seed,
optimized_model = als.fit(full_train)
from pyspark.ml.recommendation import ALS
predictions = optimized_model.transform(full_test)
predictions_drop = predictions.dropna()
coldStartStrategy has been introduced with SPARK-14489 in Spark 2.2, which hasn't been released yet:
If you want to use it you have to build Spark from source or use developer builds.
na.drop should have the same same effect as using
drop strategy, which internally it is implemented as:
case ALSModel.Drop => predictions.na.drop("all", Seq($(predictionCol)))