fubal fubal - 1 year ago 82
Python Question

Easiest way to parallelise a call to map?

Hey I have some code in Python which is basically a World Object with Player objects. At one point the Players all get the state of the world and need to return an action. The calculations the players do are independent and only use the instance variables of the respective player instance.

while True:
#do stuff, calculate state with the actions array of last iteration
for i, player in enumerate(players):
actions[i] = player.get_action(state)

What is the easiest way to run the inner
loop parallel? Or is this a bigger task than I am assuming?

Answer Source

The most straightforward way is to use multiprocessing.Pool.map (which works just like map):

import multiprocessing
pool = multiprocessing.Pool()

def do_stuff(player):
    ...  # whatever you do here is executed in another process

while True:
    pool.map(do_stuff, players)

Note however that this uses multiple processes. There is no way of doing multithreading in Python due to the GIL.

Usually parallelization is done with threads, which can access the same data inside your program (because they run in the same process). To share data between processes one needs to use IPC (inter-process communication) mechanisms like pipes, sockets, files etc. Which costs more resources. Also, spawning processes is much slower than spawning threads.

Other solutions include:

  • vectorization: rewrite your algorithm as computations on vectors and matrices and use hardware accelerated libraries to execute it
  • using another Python distribution that doesn't have a GIL
  • implementing your piece of parallel code in another language and calling it from Python

A big issue comes when your have to share data between the processes/threads. For example in your code, each task will access actions. If you have to share state, welcome to concurrent programming, a much bigger task, and one of the hardest thing to do right in software.