chimeric chimeric - 3 months ago 10
Python Question

Subprocess not working with complex Unix command

I am trying to run a

join
command within Python, and I'm being foiled by
subprocess
. I'm combining thousands of large files iteratively, so a dictionary would require a lot of memory. My rationale is that
join
only has to deal with two files at a time, so my memory overhead will be lower.

I have tried many different versions of this trying to get
subprocess
to run. Can anyone explain why this is not working? When I print the
cmd
and execute it myself on the shell, it runs perfectly.

cmd = "join <(sort %s) <(sort %s)" % (outfile, filename)
with open(out_temp, 'w') as out:
return_code = subprocess.call(cmd, stdout=out, shell=True)
if return_code != 0:
print "not working!"
break


The error produced looks like this. However, when I have python print
cmd
and execute it myself on the shell, it runs perfectly.

/bin/sh: -c: line 0: syntax error near unexpected token `('


I have also tried turning the command into a list, but I'm not sure what the rationale is for how to break up the commands. Can anyone explain?
outfile
and
filename
are variables

["join" , "<(sort" , outfile , ") <(sort" , filename , ")"]


Any help would be appreciated! I'm doing this in Python because I'm heavily parsing filenames upstream to figure out which files to combine.

Answer

<( is a bash extension to standard shell syntax. Notice in the error message that it's running /bin/sh, not /bin/bash; even if /bin/sh is a link to /bin/bash, bash drops many of its extensions when it's run using that link.

You can use bash explicitly with:

cmd = "bash -c 'join <(sort %s) <(sort %s)'" % (outfile, filename)