MPa MPa - 1 month ago 19
Python Question

Python multiprocessing: Simple job split across many processes

EDIT



The proposed code actually worked! I was simply running it from within an IDE that wasn't showing the outputs.

I'm leaving the question up because the comments/answers are instructive




I need to split a big job across many workers.
In trying to figure out how to do this, I used the following simple example, with code mostly taken from here.
Basically, I am taking a list, breaking it up in shorter sublists (chunks), and asking
multiprocessing
to print the content of each sublist with a dedicated worker:

import multiprocessing
from math import ceil

# Breaking up the long list in chunks:
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]

# Some simple function
def do_job(job_id, data_slice):
for item in data_slice:
print("{}_{}".format(job_id, item))


I then do this:

if __name__ == '__main__':

# My "long" list
l = [letter for letter in 'abcdefghijklmnopqrstuvwxyz']

my_chunks = chunks(l, ceil(len(l)/4))


At this point, my_chunks is as expected:

[['a', 'b', 'c', 'd', 'e', 'f', 'g'],
['h', 'i', 'j', 'k', 'l', 'm', 'n'],
['o', 'p', 'q', 'r', 's', 't', 'u'],
['v', 'w', 'x', 'y', 'z']]


Then:

jobs = []
for i, s in enumerate(my_chunks):
j = mp.Process(target=do_job, args=(i, s))
jobs.append(j)
for j in jobs:
print('starting job {}'.format(str(j)))
j.start()


Initially, I wrote the question because I was not getting the expected printouts from the
do_job
function.

Turns out the code works just fine when run from command line.

Answer

Maybe it's your first time with multiprocessing? Do you wait for the processes to exit or do you exit the main processes before your processes have time to complete there job?

from multiprocessing import Process
from string import ascii_letters
from time import sleep


def job(chunk):
    done = chunk[::-1]
    print(done)


def chunk(data, parts):
    divided = [None]*parts
    n = len(data) // parts
    for i in range(parts):
        print(i, n)
        divided[i] = data[i*n:n*(i+1)]
    return divided


def main():
    data = list(ascii_letters)
    workers = 4
    data_chunks = chunk(data, workers)
    ps = []
    for i in range(4):
        w = Process(target=job, args=(data_chunks[i],))
        w.deamon = True
        w.start()
        ps += [w]
    sleep(2)



if __name__ == '__main__':
    main()