Jary zhen Jary zhen - 2 months ago 22
Scala Question

spark1.6.2 with scala2.10.6 No TypeTag available

Im trying to run the KMeans case from here.

This is my code:

def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(this.getClass.getName).setMaster("local[10]")//.set("spark.sql.warehouse.dir", "file:///")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
// Crates a DataFrame
val dataset: DataFrame = sqlContext.createDataFrame(Seq(
(1, Vectors.dense(0.0, 0.0, 0.0)),
(2, Vectors.dense(0.1, 0.1, 0.1)),
(3, Vectors.dense(0.2, 0.2, 0.2)),
(4, Vectors.dense(9.0, 9.0, 9.0)),
(5, Vectors.dense(9.1, 9.1, 9.1)),
(6, Vectors.dense(9.2, 9.2, 9.2))
)).toDF("id", "features")

// Trains a k-means model
val kmeans = new KMeans()
.setK(2)
.setFeaturesCol("features")
.setPredictionCol("prediction")
val model = kmeans.fit(dataset)

// Shows the result
println("Final Centers: ")
model.clusterCenters.foreach(println)}


The Error follow:

Information:2016/9/19 0019 下午 3:36 - Compilation completed with 1 error and 0 warnings in 2s 454ms
D:\IdeaProjects\de\src\main\scala\com.te\KMeansExample.scala
Error:Error:line (18)No TypeTag available for (Int, org.apache.spark.mllib.linalg.Vector)
val dataset: DataFrame = sqlContext.createDataFrame(Seq(


some details:

1. When I run this with spark1.6.2 and scala 2.10.6.it is compile fail and show the Error above.But when change the scala version to 2.11.0 .it's run OK.

2. I run this code in Hue which submit this job to my Cluster by Livy, and my Cluster build with Spark1.6.2 and scala2.10.6

Can anybody help me ? Thanks

hue

Answer

I am not very sure about the cause of this problem but I think that it is because scala reflection in older versions of scala was not able to work out the TypeTag of yet not inferred function parameters.

In this case,

val dataset: DataFrame = sqlContext.createDataFrame(Seq(
  (1, Vectors.dense(0.0, 0.0, 0.0)),
  (2, Vectors.dense(0.1, 0.1, 0.1)),
  (3, Vectors.dense(0.2, 0.2, 0.2)),
  (4, Vectors.dense(9.0, 9.0, 9.0)),
  (5, Vectors.dense(9.1, 9.1, 9.1)),
  (6, Vectors.dense(9.2, 9.2, 9.2))
)).toDF("id", "features")

The parameter Seq((1, Vectors.dense(0.0, 0.0, 0.0)),.....) is being seen by Scala the first time and hence its type is still not inferred by the system. And hence scala reflection can not work out the associated TypeTag.

So... my guess is that if you just move that out.. allow scala to infer the type... it will work.

val vectorSeq = Seq(
  (1, Vectors.dense(0.0, 0.0, 0.0)),
  (2, Vectors.dense(0.1, 0.1, 0.1)),
  (3, Vectors.dense(0.2, 0.2, 0.2)),
  (4, Vectors.dense(9.0, 9.0, 9.0)),
  (5, Vectors.dense(9.1, 9.1, 9.1)),
  (6, Vectors.dense(9.2, 9.2, 9.2))
)

val dataset: DataFrame = sqlContext.createDataFrame(vectorSeq).toDF("id", "features")