view raw
郭同jetNLP 郭同jetNLP - 1 year ago 89
Java Question

Can not understand how Spark let python run at Yarn? How does the ProcessBuilder deal with zip file?

The step is :

1.package all the python files into the when building Spark.

2.spark-submit to Yarn it distributed the to all the machine.

3.Spark Worker find the and process the python file in it.

But the code here and here shows that it only put the zip files' path into ProcessBuilder's environment. And I haven't find the code that unzip .

So I'm wondering how does ProcessBuilder unzip the ?
Or how does Spark Worker run the python files in ?


In fact if you type python -h, it will show

Other environment variables:
PYTHONPATH   : ':'-separated list of directories prefixed to the default module search path.  The result is sys.path.

And ProcessBuilder could use the zip without unzip it.

Also,A zip file could be import in Python directly, you don’t need to unzip it.

List commands = new java.util.ArrayList<String>();
commands.add("test");// in
ProcessBuilder pb = new ProcessBuilder();
Map workerEnv = pb.environment();
workerEnv.put("PYTHONPATH", "/path/to/");
Process worker = pb.start();