tamjd1 tamjd1 - 2 months ago 26
Scala Question

Loading Java spark config from yaml file

I have a java spark app in which I instantiate a

SparkConf
object with the required configurations for Spark. Currently, it looks like this:

SparkConf conf = new SparkConf()
.setAppName(appName)
.setMaster(master)
.set("spark.executor.memory", "8g")
.set....


The master and app name come from a
yaml
file which contains app configurations, and the rest of the spark configurations are hardcoded and set one at a time.

My
yaml
file also contains these key/value pairs of configurations for Spark. My other (python) apps are using the spark configs directly from here. It looks like this:

spark:
master: ...
appname: ...
conf:
spark.mesos.executor.home: '/data/spark'
spark.executor.memory: '8g'
spark.network.timeout: '420'
... other spark configs


I'm wondering if I can use these configs from the
yaml
file to set the spark configs in the code automatically using
setAll()
method provided by
SparkConf
, instead of setting them one at a time.

This is how I'm reading the configs from the
yaml
file currently but it's not working:

LinkedHashMap<String, String> sparkConf = new LinkedHashMap<>((Map<String, String>) ((Map) yaml.get("spark")).get("conf"));


How can I load
spark: conf
from the
yaml
file so it can be used by the
setAll()
method? Apparently, the method expects a scala object of type:
Traversable<Tuple2<String, String>>
.

Answer

You can add "snakeyaml" dependency in your project to read yaml file in java.

 <dependency>
        <groupId>org.yaml</groupId>
        <artifactId>snakeyaml</artifactId>
        <version>1.17</version>
 </dependency>

Now if you have "application.yaml" file having configuration defined like you have posted, you can read it and create SparkConf with setAll() method in java like below.

import org.yaml.snakeyaml.Yaml;
import scala.collection.JavaConversions;

Yaml yaml = new Yaml();  
InputStream is = MySparkApplication.class.getClassLoader().getResourceAsStream("application.yaml");
Map<String, Object> yamlParsers = (Map<String, Object>) yaml.load(is);
LinkedHashMap<String,Object> spark = (LinkedHashMap<String,Object>) yamlParsers.get("spark"); 
LinkedHashMap<String,String> config = (LinkedHashMap<String,String>) spark.get("conf");
SparkConf conf = new SparkConf()
             .setAppName((String) spark.get("appname"))
             .setMaster((String) spark.get("master"))
             .setAll(JavaConversions.mapAsScalaMap(config));
Comments