Gerhard Hagerer Gerhard Hagerer - 1 month ago 20
Linux Question

Python 2.7 concurrent.futures.ThreadPoolExecutor does not parallelize

I am running the following code on a Intel i3-based machine with 4 virtual cores (2 hyperthreads/physical core, 64bit) and Ubuntu 14.04 installed:

n = multiprocessing.cpu_count()
executor = ThreadPoolExecutor(n)
tuple_mapper = lambda i: (i, func(i))
results = dict(executor.map(tuple_mapper, range(10)))


The code does not seem to be executed in a parallel fashion, since the CPU is utilized only 25% constantly. On the utilization graph only one of the 4 virtual cores is used 100% at a time. The utilized cores are alternating every 10 seconds or so.

But the parallelization works well on a server machine with the same software setting. I don't know the exact number of cores nor the exact processor type, but I know for sure that it has several cores and the utilization is at 100% and that the calculations have a rapid speedup (10 times faster after using parallelization, made some experiments with it).

I would expect, that parallelization would work on my machine too, not only on the server.

Why does it not work? Does it have something to do with my operating system settings? Do I have to change them?

Thanks in advance!

Update:
For the background information see the correct answer below. For the sake of completeness, I want to give a sample code which solved the problem:

tuple_mapper = lambda i: (i, func(i))
n = multiprocessing.cpu_count()
with concurrent.futures.ProcessPoolExecutor(n) as executor:
results = dict(executor.map(tuple_mapper, range(10)))


Before you reuse this take care that all functions you are using are defined at the top-level of a module as described here:
Python multiprocessing pickling error

Answer

It sounds like you're seeing the results of Python's Global Interpreter Lock (a.k.a GIL).

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once.

As all your threads are running pure Python code, only one of them can actually run in parallel. That should cause only one CPU to be active and matches your description of the problem.

You can get around it by using a multiple processes with ProcessPoolExecutor from the same module. Other solutions include switching to Jython or IronPython which don't have GIL.

The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.