user330612 user330612 -4 years ago 295
Linux Question

python subprocess module hangs for spark-submit command when writing STDOUT

I have a python script that is used to submit spark jobs using the spark-submit tool. I want to execute the command and write the output both to STDOUT and a logfile in real time. i'm using python 2.7 on a ubuntu server.

This is what I have so far in my script


# Submit the command
def submitJob(cmd, log_file):
with open(log_file, 'w') as fh:
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
output = process.stdout.readline()
if output == '' and process.poll() is not None:
if output:
print output.strip()
rc = process.poll()
return rc

if __name__ == "__main__":
cmdList = ["dse", "spark-submit", "--spark-master", "spark://", "--class", "com.spark.myapp", "./myapp.jar"]
log_file = "/tmp/out.log"
exist_status = submitJob(cmdList, log_file)
print "job finished with status ",exist_status

The strange thing is, when I execute the same command direcly in the shell it works fine and produces output on screen as the proggram proceeds.

So it looks like something is wrong in the way I'm using the subprocess.PIPE for stdout and writing the file.

What's the current recommended way to use subprocess module for writing to stdout and log file in real time line by line? I see bunch of options on the internet but not sure which is correct or latest.


Answer Source

Figured out what the problem was. I was trying to redirect both stdout n stderr to pipe to display on screen. This seems to block the stdout when stderr is present. If I remove the stderr=stdout argument from Popen, it works fine. So for spark-submit it looks like you don't need to redirect stderr explicitly as it already does this implicitly

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download