EY.Mohamed EY.Mohamed - 3 months ago 50
Scala Question

Spark test with scalaTest unsuccesful

Here is a Test class for a Spark application in scala using ScalaTest , when running sbt test i get a java.lang.ExceptionInInitializerError caused by org.apache.spark.SparkException: A master URL must be set in your configuration and the test is not executed , which i don't understand since i'm setting the master to local when declaring conf. Does anyone have an idea why ? Thanks in advance.

The Test Class:

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import org.scalatest._

class SizeByMailboxTest extends FlatSpec with Matchers with BeforeAndAfter {

val master = "local"
val appName = "example-spark"
var sc: SparkContext = _

before {
val conf = new SparkConf().setMaster(master).setAppName(appName)
sc = new SparkContext(conf)
}

after {
if (sc != null) {
sc.stop()
}
}
behavior of "SizeByMailbox"
it should "count total content size per mailbox with duplicates" in {
val sample = Array(
SizeByMailbox.Message("1",10,50),
SizeByMailbox.Message("2",5,60),
SizeByMailbox.Message("2",8,40),
SizeByMailbox.Message("1",7,80)
)
val samples = sc.parallelize(sample)
val sizeById = SizeByMailbox.count(samples)
sizeById.collect().map(m=>SizeByMailbox.MailBox(m.mailboxid,m.totalsize)) should contain allOf (SizeByMailbox.MailBox("1", 130),SizeByMailbox.MailBox("2", 100))
}
}


The App:

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import org.apache.spark.rdd.RDD


object SizeByMailbox {
val sc = new SparkContext();
case class Message(mailboxid: String, bodyoctets: Int,fullcontentoctets: Int) ;
case class MailBox(mailboxid: String,totalsize: Int);
def count(messages: RDD[Message]) : RDD[MailBox] = {

val total_by_mailbox = messages.map (m => (m.mailboxid,m.fullcontentoctets)).reduceByKey(_+_).map( m => MailBox(m._1,m._2))
total_by_mailbox
}
def main(args: Array[String]) {
`**enter code here**` ....
}

Answer

You are creating another SparkContext in the App itself which does not have the SparkConf specified and as a result has no master URL.

object SizeByMailbox {
    val sc = new SparkContext(); <-- here

You have not posted the stacktrace but that is what I suspect the error to be.

As a good practice try not to create more than one active SparkContext in a JVM as Spark can behave unpredictably.