S.Kang S.Kang - 10 months ago 71
Scala Question

sortBy is not a member of org.apache.spark.rdd.RDD

Hello~ I'm interested in SPARK.
I use this below code in spark-shell.

val data = sc.parallelize(Array(Array(1,2,3), Array(2,3,4), Array(1,2,1))
res6: org.apache.spark.rdd.RDD[Array[Int]] = ParallelCollectionRDD[0] at parallelize at <console>:26

data.map(x => (x(d), 1)).reduceByKey((x,y) => x + y).sortBy(_._1)
res9: Array[(Int, Int)] = Array((1,2), (2,1))

It work. But, if I use this command using sbt assembly, It's not worked.

The error message is

[error] value sortBy is not a member of org.apache.spark.rdd.RDD[(Int, Int)]

[error] data.map(x => (x(d), 1)).reduceByKey((x,y) => x + y).sortBy(_._1) <= here is the problem.

my build.sbt code is

import AssemblyKeys._


name := "buc"

version := "0.1"

scalaVersion := "2.10.5"

libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0" % "provided"

Is there something problem?

Answer Source

The first problem is that you are using spark 1.0.0, and if you read the documentation you won't find any sortBy method in the RDD class. So,you should update from 1.0.x to 2.0.x.

On other hand, the spark-mllib dependency is used to get the Spark MLlib library and that's not what you need. You need to get the dependency for spark-core :

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.0" % "provided"