blue-sky blue-sky - 8 months ago 50
Scala Question

Run Scala Spark with SBT

The code below causes Spark to become unresponsive:

System.setProperty("hadoop.home.dir", "H:\\winutils");

val sparkConf = new SparkConf().setAppName("GroupBy Test").setMaster("local[1]")
val sc = new SparkContext(sparkConf)

def main(args: Array[String]) {

val text_file = sc.textFile("h:\\data\\details.txt")

val counts = text_file
.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)



I'm setting hadoop.home.dir in order to avoid the error mentioned here: Failed to locate the winutils binary in the hadoop binary path

This is how my build.sbt file looks like:

lazy val root = (project in file(".")).
name := "hello",
version := "1.0",
scalaVersion := "2.11.0"

libraryDependencies ++= Seq(

"org.apache.spark" % "spark-core_2.11" % "1.6.0"


Should Scala Spark be compilable/runnable using the sbt code in the file?

I think code is fine, it was taken verbatim from, but I am not sure if the Hadoop WinUtils path is required.

Update: "The solution was to use fork := true in the main build.sbt"
Here is the reference: Spark: ClassNotFoundException when running hello world example in scala 2.11


This is the content of my build.sbt. Notice that if your internet connection is slow it might take some time.

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.6.1",
  "org.apache.spark" %% "spark-mllib" % "1.6.1",
  "org.apache.spark" %% "spark-sql" % "1.6.1",
  "org.slf4j" % "slf4j-api" % "1.7.12"

run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))

In the main I added this, however it depends on where you placed the winutil folder.

System.setProperty("hadoop.home.dir", "c:\\winutil")

How it looks