I have a python script that is used to submit spark jobs using the spark-submit tool. I want to execute the command and write the output both to STDOUT and a logfile in real time. i'm using python 2.7 on a ubuntu server.
This is what I have so far in my SubmitJob.py script
# Submit the command
def submitJob(cmd, log_file):
with open(log_file, 'w') as fh:
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = process.stdout.readline()
if output == '' and process.poll() is not None:
rc = process.poll()
if __name__ == "__main__":
cmdList = ["dse", "spark-submit", "--spark-master", "spark://127.0.0.1:7077", "--class", "com.spark.myapp", "./myapp.jar"]
log_file = "/tmp/out.log"
exist_status = submitJob(cmdList, log_file)
print "job finished with status ",exist_status
Figured out what the problem was. I was trying to redirect both stdout n stderr to pipe to display on screen. This seems to block the stdout when stderr is present. If I remove the stderr=stdout argument from Popen, it works fine. So for spark-submit it looks like you don't need to redirect stderr explicitly as it already does this implicitly