ealeon ealeon - 2 months ago 27
Python Question

Python subprocess.Popen poll seems to hang but communicate works

child = subprocess.Popen(command,
shell=True,
env=environment,
close_fds=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
stdin=sys.stdin,
preexec_fn=os.setsid
)

child_interrupted = False
while child.poll() is None:
if Signal.isInterrupted():
child_interrupted = True
os.killpg(os.getpgid(child.pid), signal.SIGTERM)
break
time.sleep(0.1)

subout = child.communicate()[0]
logging.info(subout)


the above works for most command it executes (90%) but for some commands it hangs

for those command that repeatedly hangs, if i get rid of the below, it works fine:

child_interrupted = False
while child.poll() is None:
if Signal.isInterrupted():
child_interrupted = True
os.killpg(os.getpgid(child.pid), signal.SIGTERM)
break
time.sleep(0.1)


im assuming for those hanging commands,
child.poll() is None
even though the job is finished??

communicate() can tell the process is finished but poll() cant?

i've executed
ps -ef
on those processes

and they are defunct only when
child.poll()
code is in place

any idea why?

it looks like defunct means "That's a zombie process, it's finished but the parent hasn't wait()ed for it yet."
well, im polling to see if i can call wait/communitcate...

Answer

You've set the Popen object to receive the subprocess's stdout via pipe. Problem is, you're not reading from that pipe until the process exits. If the process produces enough output to fill the OS level pipe buffers, and you don't drain the pipe, then you're deadlocked; the subprocess wants you to read the output its writing so it can continue to write, then exit, while you're waiting for it to exit before you'll read the output.

If your explicit poll and interrupt checking is necessary, the easiest solution to this deadlock is probably to launch a thread that drains the pipe:

... launch the thread just after Popen called ...

draineddata = []
# Trivial thread just reads lines from stdout into the list
drainerthread = threading.Thread(target=draineddata.extend, args=(child.stdout,))
drainerthread.daemon = True
drainerthread.start()

... then where you had been doing communicate, change it to: ...
child.wait()
drainerthread.join()
subout = b''.join(draineddata)  # Combine the data read back to a single output