Barbot Barbot - 29 days ago 9
Python Question

Using xargs for parallel Python scripts

I currently have a bash script,, with two nested loops. The first enumerates possible values for a, and the second enumerates possible values for b, like

for a in {1..10}
for b in {1..10}
nohup python $a $b &

So this spawns off 100 Python processes running, one for each (a,b) pair. However, my machine only has 5 cores, so I want to cap the number of processes at 5 to avoid thrashing/wasteful switching. The goal is that I am always running 5 processes until all 100 processes are done.

xargs seems to be one way to do this, but I don't know how to pass these arguments to xargs. I've checked other similar questions but don't understand the surrounding bash jargon well enough to know what's happening. For example, I tried

seq 1 | xargs -i --max-procs=5 bash

but this doesn't seem to do anything - runs as before and still spawns off 100 processes.

I assume I'm misunderstanding how xargs works.



This would actually look more like:

for a in {1..10}; do
  for b in {1..10}; do
    printf '%s\0' "$a" "$b"
done | xargs -0 -x -n 2 -P 5 python

Note that there's no nohup, nor any & -- to track the number of concurrent invocations, xargs needs to be directly executing the Python script, and that process can't exit until it's complete.

The non-standard (but widely available) -0 extension requires input to be in NUL-delimited form (as created with printf '%s\0'); this ensures correct behavior with arguments having spaces, quotes, backslashes, etc.

The likewise non-standard -P 5 sets the maximum number of processes (in a way slightly more portable than --max-procs=5, which is supported on GNU but not modern BSD xargs).

The -n 2 indicates that each instance of the Python script receives only two arguments, thus starting one per pair of inputs.

The -x (used in conjunction with -n 2) indicates that if a single Python instance can't be given two arguments (for instance, if the arguments are so long that both can't fit on a single command line), this should be treated as a failure, rather than invoking a Python instance with only one argument.