João_testeSW João_testeSW - 2 months ago 39
Scala Question

Spark 1.6 - Association Rules algorithm - Cannot be applied to (org.apache.spark.rdd.RDD[Array[String]])

I've this code to find some association rules:

import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset

val data = sc.textFile("FILE");

val transactions: RDD[Array[String]] = data.map(s => s.trim.split(','));

val ar = new AssociationRules()
.setMinConfidence(0.8)
val results = ar.run(transactions)

results.collect().foreach { rule =>
println("[" + rule.antecedent.mkString(",")
+ "=>"
+ rule.consequent.mkString(",") + "]," + rule.confidence)
}


But I'm getting this error:

<console>:50: error: overloaded method value run with alternatives:
[Item](freqItemsets: org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[Item]])org.apache.spark.api.java.JavaRDD[org.apache.spark.mllib.fpm.AssociationRules.Rule[Item]] <and>
[Item](freqItemsets: org.apache.spark.rdd.RDD[org.apache.spark.mllib.fpm.FPGrowth.FreqItemset[Item]])(implicit evidence$1: scala.reflect.ClassTag[Item])org.apache.spark.rdd.RDD[org.apache.spark.mllib.fpm.AssociationRules.Rule[Item]]
cannot be applied to (org.apache.spark.rdd.RDD[Array[String]])
val results = ar.run(transactions)


How can I transform this rdd to the type that the association rules needs?

Many thanks!

Answer

You will first have to create an FPGrowthModel and then pass the freqItemsets like below:

import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
import org.apache.spark.mllib.fpm.FPGrowth

val data = sc.textFile("FILE");

val transactions: RDD[Array[String]] = data.map(s => s.trim.split(','));

val fpg = new FPGrowth()
  .setMinSupport(0.2)
  .setNumPartitions(10)

val model = fpg.run(transactions) // creates the FPGrowthModel

val ar = new AssociationRules()
  .setMinConfidence(0.8)

val results = ar.run(model.freqItemsets)
Comments