MPa MPa - 15 days ago 4x
Python Question

Python multiprocessing: Simple job split across many processes


The proposed code actually worked! I was simply running it from within an IDE that wasn't showing the outputs.

I'm leaving the question up because the comments/answers are instructive

I need to split a big job across many workers.
In trying to figure out how to do this, I used the following simple example, with code mostly taken from here.
Basically, I am taking a list, breaking it up in shorter sublists (chunks), and asking
to print the content of each sublist with a dedicated worker:

import multiprocessing
from math import ceil

# Breaking up the long list in chunks:
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]

# Some simple function
def do_job(job_id, data_slice):
for item in data_slice:
print("{}_{}".format(job_id, item))

I then do this:

if __name__ == '__main__':

# My "long" list
l = [letter for letter in 'abcdefghijklmnopqrstuvwxyz']

my_chunks = chunks(l, ceil(len(l)/4))

At this point, my_chunks is as expected:

[['a', 'b', 'c', 'd', 'e', 'f', 'g'],
['h', 'i', 'j', 'k', 'l', 'm', 'n'],
['o', 'p', 'q', 'r', 's', 't', 'u'],
['v', 'w', 'x', 'y', 'z']]


jobs = []
for i, s in enumerate(my_chunks):
j = mp.Process(target=do_job, args=(i, s))
for j in jobs:
print('starting job {}'.format(str(j)))

Initially, I wrote the question because I was not getting the expected printouts from the

Turns out the code works just fine when run from command line.


Maybe it's your first time with multiprocessing? Do you wait for the processes to exit or do you exit the main processes before your processes have time to complete there job?

from multiprocessing import Process
from string import ascii_letters
from time import sleep

def job(chunk):
    done = chunk[::-1]

def chunk(data, parts):
    divided = [None]*parts
    n = len(data) // parts
    for i in range(parts):
        print(i, n)
        divided[i] = data[i*n:n*(i+1)]
    return divided

def main():
    data = list(ascii_letters)
    workers = 4
    data_chunks = chunk(data, workers)
    ps = []
    for i in range(4):
        w = Process(target=job, args=(data_chunks[i],))
        w.deamon = True
        ps += [w]

if __name__ == '__main__':