David Doria David Doria - 1 month ago 25
Python Question

Do you have to check exit_status_ready if you are going to check recv_ready()?

I am running a remote command with:

ssh = paramiko.SSHClient()
ssh.connect(host)
stdin, stdout, stderr = ssh.exec_command(cmd)


Now I want to get the output. I have seen things like this:

# Wait for the command to finish
while not stdout.channel.exit_status_ready():
if stdout.channel.recv_ready():
stdoutLines = stdout.readlines()


But that seems to sometimes never run the
readlines()
(even when there is supposed to be data on stdout). What that seems to mean to me is that stdout.channel.recv_ready() is not necessarily ready (True) as soon as stdout.channel.exit_status_ready() is True.

Is something like this appropriate?

# Wait until the data is available
while not stdout.channel.recv_ready():
pass

stdoutLines = stdout.readlines()


That is, do I really first have to check the exit status before waiting for
recv_ready()
to say the data is ready?

How would I know if there is supposed to be data on stdout before waiting in an infinite loop for stdout.channel.recv_ready() to become True (which it does not if there is not supposed to be any stdout output)?

Answer

That is, do I really first have to check the exit status before waiting for recv_ready() to say the data is ready?

No. It is perfectly fine to receive data (e.g. stdout/stderr) from the remote process even though it did not yet finish. Also some sshd implementations do not even provide the exit status of the remote proc in which case you'll run into problems, see paramiko doc: exit_status_ready.

The problem with waiting for exit_status_code for short living remote commands is that your local thread may receive the exit_code faster than you check your loop condition. In this case you won't ever enter the loop and readlines() will never be called. Here's an example:

# spawns new thread to communicate with remote
# executes whoami which exits pretty fast
stdin, stdout, stderr = ssh.exec_command("whoami") 
time.sleep(5)  # main thread waits 5 seconds
# command already finished, exit code already received
#  and set by the exec_command thread.
# therefore the loop condition is not met 
#  as exit_status_ready() already returns True 
#  (remember, remote command already exited and was handled by a different thread)
while not stdout.channel.exit_status_ready():
    if stdout.channel.recv_ready():
        stdoutLines = stdout.readlines()

How would I know if there is supposed to be data on stdout before waiting in an infinite loop for stdout.channel.recv_ready() to become True (which it does not if there is not supposed to be any stdout output)?

channel.recv_ready() just indicates that there is unread data in the buffer.

def recv_ready(self):
    """
    Returns true if data is buffered and ready to be read from this
    channel.  A ``False`` result does not mean that the channel has closed;
    it means you may need to wait before more data arrives.

This means that potentially due to networking (delayed packets, retransmissions, ...) or just your remote process not writing to stdout/stderr on a regular basis may result in recv_ready being False. Therefore, having recv_ready() as the loop condition may result in your code returning prematurely as it is perfectly fine for it to sometimes yield True (when the remote process wrote to stdout and your local channel thread received that output) and sometimes yield False (e.g. your remote proc is sleeping and not writing to stdout) within an iteration.

Besides that, people occasionally experience paramiko hangs that might be related to having stdout/stderr buffers filling up (pot. related to problems with Popen and hanging procs when you never read from stdout/stderr and the internal buffers fill up).

The code below implements a chunked solution to read from stdout/stderr emptying the buffers while the channel is open.

def myexec(ssh, cmd, timeout, want_exitcode=False):
  # one channel per command
  stdin, stdout, stderr = ssh.exec_command(cmd) 
  # get the shared channel for stdout/stderr/stdin
  channel = stdout.channel

  # we do not need stdin.
  stdin.close()                 
  # indicate that we're not going to write to that channel anymore
  channel.shutdown_write()      

  # read stdout/stderr in order to prevent read block hangs
  stdout_chunks = []
  stdout_chunks.append(stdout.channel.recv(len(stdout.channel.in_buffer)))
  # chunked read to prevent stalls
  while not channel.closed or channel.recv_ready() or channel.recv_stderr_ready(): 
      # stop if channel was closed prematurely, and there is no data in the buffers.
      got_chunk = False
      readq, _, _ = select.select([stdout.channel], [], [], timeout)
      for c in readq:
          if c.recv_ready(): 
              stdout_chunks.append(stdout.channel.recv(len(c.in_buffer)))
              got_chunk = True
          if c.recv_stderr_ready(): 
              # make sure to read stderr to prevent stall    
              stderr.channel.recv_stderr(len(c.in_stderr_buffer))  
              got_chunk = True  
      '''
      1) make sure that there are at least 2 cycles with no data in the input buffers in order to not exit too early (i.e. cat on a >200k file).
      2) if no data arrived in the last loop, check if we already received the exit code
      3) check if input buffers are empty
      4) exit the loop
      '''
      if not got_chunk \
          and stdout.channel.exit_status_ready() \
          and not stderr.channel.recv_stderr_ready() \
          and not stdout.channel.recv_ready(): 
          # indicate that we're not going to read from this channel anymore
          stdout.channel.shutdown_read()  
          # close the channel
          stdout.channel.close()
          break    # exit as remote side is finished and our bufferes are empty

  # close all the pseudofiles
  stdout.close()
  stderr.close()

  if want_exitcode:
      # exit code is always ready at this point
      return (''.join(stdout_chunks), stdout.channel.recv_exit_status())
  return ''.join(stdout_chunks)

The channel.closed is just the ultimate exit condition in case the channel prematurely closes. Right after a chunk was read the code checks if the exit_status was already received and no new data was buffered in the meantime. If new data arrived or no exit_status was received the code will keep on trying to read chunks. once the remote proc exited and there is no new data in the buffers we're assuming that we've read everything and begin closing the channel. Note that in case you wan to receive the exit status you should always wait until it was received otherwise paramiko might block forever.

This way it is guaranteed that the buffers do not fill up and make your proc hang. exec_command only returns if the remote command exited and there is no data left in our local buffers. The code is also a bit more cpu friendly by utilizing select() instead of polling in a busy loop but might be a bit slower for short living commands.

Just for reference, to safeguard against some infinite loops one can set a channel timeout that fires when no data arrives for a period of time

 chan.settimeout(timeout)
 chan.exec_command(command)
Comments