blue-sky blue-sky - 1 month ago 10
Scala Question

Run Scala Spark with SBT

The code below causes Spark to become unresponsive:

System.setProperty("hadoop.home.dir", "H:\\winutils");

val sparkConf = new SparkConf().setAppName("GroupBy Test").setMaster("local[1]")
val sc = new SparkContext(sparkConf)

def main(args: Array[String]) {

val text_file = sc.textFile("h:\\data\\details.txt")

val counts = text_file
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)

println(counts);

}


I'm setting hadoop.home.dir in order to avoid the error mentioned here: Failed to locate the winutils binary in the hadoop binary path

This is how my build.sbt file looks like:

lazy val root = (project in file(".")).
settings(
name := "hello",
version := "1.0",
scalaVersion := "2.11.0"
)


libraryDependencies ++= Seq(

"org.apache.spark" % "spark-core_2.11" % "1.6.0"

)


Should Scala Spark be compilable/runnable using the sbt code in the file?

I think code is fine, it was taken verbatim from http://spark.apache.org/examples.html, but I am not sure if the Hadoop WinUtils path is required.

Update: "The solution was to use fork := true in the main build.sbt"
Here is the reference: Spark: ClassNotFoundException when running hello world example in scala 2.11

Answer

This is the content of my build.sbt. Notice that if your internet connection is slow it might take some time.

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.6.1",
  "org.apache.spark" %% "spark-mllib" % "1.6.1",
  "org.apache.spark" %% "spark-sql" % "1.6.1",
  "org.slf4j" % "slf4j-api" % "1.7.12"
)


run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))

In the main I added this, however it depends on where you placed the winutil folder.

System.setProperty("hadoop.home.dir", "c:\\winutil")

How it looks

Comments