rvisio rvisio - 2 months ago 7
Java Question

Unable to start/stop spark in Before and After methods in Unit Test

I'm writing unit tests for my spark application but am running into an issue. Ideally, I'd like to have a

Before
and
After
method where I start and stop spark (rather than doing it in the unit test method itself).

The below implementation works, I'm able to declare the
SparkConf
and the
JavaSparkContext
at the beginning of the test and then execute
sc.close()
at the end.

@Test
public void testMethod(){
SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
JavaSparkContext sc = new JavaSparkContext(conf);

// yadda yadda yadda
//tests here

sc.close()
}


But I would like to set it up so that the start and stop have their own methods

@Before
public void init(){
SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
JavaSparkContext sc = new JavaSparkContext(conf);
}

@After
public void close(){
sc.close();
}


Any ideas why the latter doesn't work? I will run into an exception that some object is not serializable.

Answer
  1. At first, sc should be a class field, not local variable of init () method
  2. Secondly, isn't it better to use @BeforeClass and @AfterClass? Spark start is quite long (web ui, cluster, etc. is starting each time), if full isolation is not a problem, then @BeforeClass would be better. However it's only an advice, not solution for your question.

So, shortened code:

public class SomeSparkTest implements Serializable {

private static transient JavaSparkContext jsc;

@BeforeClass 
public static void init () {
    SparkConf conf = new SparkConf().setMaster("local").setAppName("testGeoHashAggregation");
    jsc = new JavaSparkContext(conf);
}

@AfterClass
public static void close(){
    jsc.close();
}

@Test 
public void shouldSomethingHappend () {
    // given 
    // preparation of RDDs
    JavaRDD<String> rdd = jsc.textFile(...)

    // when
    // actions
    long count = rdd.count()

    // then
    // verify output; assertThat is from FestAssertions, not necessary
    assertThat(count).isEqualTo(5)
}
}

Here you've got some test suite from Spark repository

Explanation: @Any lambda or inner class, that you're using in test, has reference to outer class, in this case it is the test class. Spark always serializes lambdas/inner classes, even in local cluster mode. If your test class is not serializable or if it has any non-serializable field, then such error will appear