Christian Yonathan S. Christian Yonathan S. - 3 months ago 12
Scala Question

Could Anyone explain about this code?

according this link: https://github.com/amplab/training/blob/ampcamp6/machine-learning/scala/solution/MovieLensALS.scala

I don't understand what is the point of :

val numUsers = ratings.map(_._2.user).distinct.count
val numMovies = ratings.map(_._2.product).distinct.count


_._2.[user|product]
, what does that mean?

Answer

ratings is a collection of tuples:(timestamp % 10, Rating(userId, movieId, rating)). The first underscore in _._2.user refers to the current element being processed by the map function. So the first underscore now refers to a tuple (pair of values). For a pair tuple t you can refer to its first and second elements in the shorthand notation: t._1 & t._2 So _._2 is selecting the second element of the tuple currently being processed by the map function.

val ratings = sc.textFile(movieLensHomeDir + "/ratings.dat").map { line =>
  val fields = line.split("::")
  // format: (timestamp % 10, Rating(userId, movieId, rating))
  (fields(3).toLong % 10, Rating(fields(0).toInt, fields(1).toInt, fields(2).toDouble))
}