Barbot Barbot - 2 months ago 35
Python Question

Using xargs for parallel Python scripts

I currently have a bash script, script.sh, with two nested loops. The first enumerates possible values for a, and the second enumerates possible values for b, like

#!/bin/sh
for a in {1..10}
do
for b in {1..10}
do
nohup python script.py $a $b &
done
done


So this spawns off 100 Python processes running script.py, one for each (a,b) pair. However, my machine only has 5 cores, so I want to cap the number of processes at 5 to avoid thrashing/wasteful switching. The goal is that I am always running 5 processes until all 100 processes are done.

xargs seems to be one way to do this, but I don't know how to pass these arguments to xargs. I've checked other similar questions but don't understand the surrounding bash jargon well enough to know what's happening. For example, I tried

seq 1 | xargs -i --max-procs=5 bash script.sh


but this doesn't seem to do anything - script.sh runs as before and still spawns off 100 processes.

I assume I'm misunderstanding how xargs works.

Thanks!

Answer

This would actually look more like:

#!/bin/bash
for a in {1..10}; do
  for b in {1..10}; do
    printf '%s\0' "$a" "$b"
  done
done | xargs -0 -x -n 2 -P 5 python script.py

Note that there's no nohup, nor any & -- to track the number of concurrent invocations, xargs needs to be directly executing the Python script, and that process can't exit until it's complete.

The non-standard (but widely available) -0 extension requires input to be in NUL-delimited form (as created with printf '%s\0'); this ensures correct behavior with arguments having spaces, quotes, backslashes, etc.

The likewise non-standard -P 5 sets the maximum number of processes (in a way slightly more portable than --max-procs=5, which is supported on GNU but not modern BSD xargs).

The -n 2 indicates that each instance of the Python script receives only two arguments, thus starting one per pair of inputs.

The -x (used in conjunction with -n 2) indicates that if a single Python instance can't be given two arguments (for instance, if the arguments are so long that both can't fit on a single command line), this should be treated as a failure, rather than invoking a Python instance with only one argument.