user7077941 user7077941 - 1 month ago 6
Python Question

How to Do Data Parallelization in Python?

So, I have 3 dimentional list. For example:

A=[[[1,2,3],[4,5,6],[7,8,9]],...,[[2,4,1],[1,4,6],[1,2,4]]]


I want to process each 2 dimentional list in A independently but they all have same process. If I do it in sequence, I do:

for i in range(len(A)):
A[i]=process(A[i])


But, it takes very long time. Could you tell me how to parallel compute by data parallelization in Python?

Answer

If you have multiple cores and processing each 2 dimensional list is expensive operation you could use Pool from multiprocessing. Here's a short example that squares numbers in different process:

import multiprocessing as mp

A = [[[1,2,3],[4,5,6],[7,8,9]],[[2,4,1],[1,4,6],[1,2,4]]]

def square(l):
    return [[x * x for x in sub] for sub in l]

with mp.Pool(processes=mp.cpu_count()) as pool:
    res = pool.map(square, A)

print(res)

Output:

[[[1, 4, 9], [16, 25, 36], [49, 64, 81]], [[4, 16, 1], [1, 16, 36], [1, 4, 16]]]

Pool.map will behave like built-in map while splitting the iterable to worker processes. It also has third parameter called chunksize that defines how big chunks are submitted to workers.

Comments