user3578925 user3578925 - 1 month ago 9
Python Question

python multiprocessing sleep between executions

I have a python script which is supposed to run multiple jobs in parallel. I set the maximum processes to 20 but I need the script to sleep 5 second between sending the jobs.
So here is my sample code:

#!/usr/bin/env python

import multiprocessing
import subprocess


def prcss(cmd):
sbc = subprocess.call
com = sbc(cmd, shell='True')
return (com)


if __name__=='__main__':

s = 'sleep 5'
cmd= []
for j in range(1,21):
for i in range(10):
sis = "nohup ~/mycodes/code > str(j)"+"/"+"out"+str(i)+".dat"
cmd.append(sis)
cmd.append(s)

pool=multiprocessing.Pool(processes=20)
pool.map(prcss,cmd)


Although I have sleep 5 in between the 'sis' commands, when I run my script all jobs start immediately. I need to have sleep in between the 'sis' commands as the output from each job depends on the computer clock. So ff I run 20 jobs, they all start with the same system clock and hence they all will have the same output.

Any idea how to have my script to sleep in between the 'sis' commands?

Abedin

Answer Source

Take a look at the docs for pool.map(). When you create a list of items and then submit them to the pool using map, all of the jobs are submitted to the pool together. Since you have 20 worker processes, 20 of your jobs will start (effectively) all at once. That includes both your sis commands and the sleep commands. There's not even a guarantee that they will be executed and complete in the same order, just that you'll receive the results in the same order. The apply_async() function might be better for you, because you can control when jobs are submitted to the pool.

It sounds to me like you want your Python script to wait 5 seconds before you issue a sis command anyway, so there's no reason you should need to execute the sleep command in a worker process. Try refactoring into something like this:

import multiprocessing
import subprocess
import time

def prcss(cmd):
  # renaming the subprocess call is silly - remove the rename
  com = subprocess.call(cmd, shell='True') 
  return (com)

if __name__=='__main__':

  pool = multiprocessing.Pool(processes=20)
  results_objects = []

  for j in range(1,21):
    for i in range(10):
      sis = 'nohup ~/mycodes/code >'+str(j)+'/'+'out'+str(i)+'.dat'

      # make an asynchronous that will execute our target function with the
      # sis command
      results_objects.append(pool.apply_async(prcss, args=(sis,))
      # don't forget the extra comma in the args - Process must receive a tuple

      # now we pause for five sections before submitting the next job
      time.sleep(5)

  # close the pool and wait for everything to finish
  pool.close()
  pool.join() 

  # retrieve all of the results
  result = [result.get() for result in results_objects]

One other note: since the syntax highlighting was applied, it's easy to see that you're missing a closing quotation in your sis string, and probably a '+' too. Instead of manually constructing your string, consider using string.format():

sis = 'nohup ~/mycodes/code > {}/out{}.dat'.format(j, i)

If the backslash is there to separate path hierarchies, you should use os.path.join():

import os
sis = os.path.join('nohup ~/mycodes/code > {}'.format(j), 'out{}.dat'.format(i))

First string generated (in either case) will be:

nohup ~/mycodes/code > 1/out0.dat