Guforu Guforu - 2 months ago 22
Scala Question

Implementation of ALS in Spark

I work now with

ALS
implemented in Spark. In the directory
/org/apache/spark/
are two different packages ml and mllib. Each of both packages has the subfolder
recommendation
and in this folder the class
ALS.scala
(mllib has additional also MatrixFactorizationModel.scala)

My question is, what is a difference between
ml
and
mllib
directories?
For example I have found the tutorial of using ALS of Apache Spark in the net. The package mllib is used in this tutorial. When I can use the package ml? Why we need to have two different packages ml and mllib?

Answer

Spark ML Lib is being reworked now. Old classes are in mllib packages, new in ml. New classes are basing on DataFrames and could be faster due to Tungsten optimisation.

Generally you should use ml package if it is possible, as in the future mllib package will be deprecated and removed.

Edit: I don't have any link to full tutorial, but here is ALS code used by me:

val als = new ALS()
    .setUserCol("userCol")
    .setItemCol("itemCol")
    .setRank(rank)
    .setMaxIter(iterationNumber)
    .setRegParam(lambda)

val model = als.fit(trainingDataFrame)
val predictions = model.transform (dataFrameToPredict)
Comments