Aroon Aroon - 22 days ago 12
Scala Question

Execute Apache Spark (Scala) code in Bash script

I am newbie to spark and scala.
I wanted to execute some spark code from inside a bash script. I wrote the following code.

Scala code was written in a separate

.scala
file as follows.

Scala Code:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
println("x="+args(0),"y="+args(1))
}
}


This is the bash script that invokes the Apache-spark/scala code.

Bash Code

#!/usr/bin/env bash
Absize=File_size1
AdBsize=File_size2
for i in `seq 2 $ABsize`
do
for j in `seq 2 $ADsize`
do
Abi=`sed -n ""$i"p" < File_Path1`
Adj=`sed -n ""$j"p" < File_Path2`
scala SimpleApp.scala $Abi $adj
done
done


But then I get the following errors.

Errors:

error:object apache is not a member of package org
import org.apache.spark.SparkContext
^
error: object apache is not a member of package org
import org.apache.spark.SparkContext._
^
error: object apache is not a member of package org
import org.apache.spark.SparkConf
^
error: not found:type SparkConf
val conf = new SparkConf().setAppName("Simple Application") ^
error: not found:type SparkContext


The above code works perfectly if the scala file is written without any spark function (That is a pure scala file), but fails when there are apache-spark imports.

What would be a good way to run and execute this from bash script? Will I have to call spark shell to execute the code?

Answer

set up spark with environment variable and run as @puhlen told with spark-submit -class SimpleApp simple-project_2.11-1.0.jar $Abi $adj