Michael Lihs Michael Lihs - 28 days ago 23
Java Question

How to pass parameters / properties to Spark jobs with spark-submit

I am running a Spark job implemented in Java using

spark-submit
. I would like to pass parameters to this job - e.g. a
time-start
and
time-end
parameter to parametrize the Spark application.

What I tried was using the

--conf key=value


option of the
spark-submit
script, but when I try to read the parameter in my Spark job with

sparkContext.getConf().get("key")


I get an exception:

Exception in thread "main" java.util.NoSuchElementException: key


Furthermore, when I use
sparkContext.getConf().toDebugString()
I don't see my value in the output.

Further Notice Since I want to submit my Spark Job via the Spark REST Service I cannot use an OS Environment Variable or the like.

Is there any possibility to implement this?

Answer

Since you want to use your custom properties you need to place your properties after application.jar in spark-submit (like in spark example [application-arguments] should be your properties. --conf should be spark configuration properties.

--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # options
  <application-jar> \
  [application-arguments] <--- here our app arguments

so when you do: spark-submit .... app.jar key=value in main method you will get args[0] as key=value.

public static void main(String[] args) {
    String firstArg = args[0]; //eq. to key=value
}

but you want to use key value pairs you need to parse somehow your app arguments.

You can check Apache Commons CLI library or some alternative.