Khris Khris - 3 months ago 29
Python Question

Python multiprocessing/threading takes longer than single processing on a virtual machine

I'm at work on a virtual machine which sits in my company's mainframe.

I have 4 cores assigned to work with so I'm trying to get into parallel processing of my Python code. I'm not familiar with it yet and I'm running into really unexpected behaviour, namely that multiprocessing/threading takes longer than single processing. I can't tell if I'm doing something wrong or if the problem comes from my virtual machine.

Here's an example:

import multiprocessing as mg
import threading
import math
import random
import time

NUM = 4

def benchmark():
for i in range(1000000):
math.exp(random.random())

threads = []
random.seed()

print "Linear Processing:"
time0 = time.time()
for i in range(NUM):
benchmark()
print time.time()-time0

print "Threading:"
for P in range(NUM):
threads.append(threading.Thread(target=benchmark))
time0 = time.time()
for t in threads:
t.start()
for t in threads:
t.join()
print time.time()-time0

threads = []
print "Multiprocessing:"
for i in range(NUM):
threads.append(mg.Process(target=benchmark))
time0 = time.time()
for t in threads:
t.start()
for t in threads:
t.join()
print time.time()-time0


The result from this is like this:

Linear Processing:
1.125
Threading:
4.56699991226
Multiprocessing:
3.79200005531


Linear processing is the fastest here which is the opposite of what I want and expected.
I'm unsure about how the join statements should be executed, so I also did the example with the joins like this:

for t in threads:
t.start()
t.join()


Now this leads to output like this:

Linear Processing:
1.11500000954
Threading:
1.15300011635
Multiprocessing:
9.58800005913


Now threading is almost as fast as single processing, while multiprocessing is even slower.

When observing processor load in the task manager the individual load of the four virtual cores never rises over 30% even while doing the multiprocessing, so I'm suspecting a configurational problem here.

I want to know if I'm doing the benchmarking correctly and if that behaviour is really as strange as I think it is.

Answer

So, firstly, you're not doing anything wrong, and when I run your example on my Macbook Pro, with cPython 2.7.12, I get:

$ python test.py
Linear Processing:
0.733351945877
Threading:
1.20692706108
Multiprocessing:
0.256340026855

However, the difference becomes more apparent when I change:

for i in range(1000000):

To:

for i in range(100000000):

The difference is much more noticible: Linear Processing:

77.5861060619
Threading:
153.572453976
Multiprocessing:
33.5992660522

Now why is threading consistently slower? Because of the Global Interpreter Lock. The only thing the threading module is good for is waiting on I/O. Your multiprocessing example is the correct way to do this.

So, in your original example, where Linear Processing was the fastest, I would blame this on the overhead of starting processes. When you're doing a small amount of work, it may often be the case that it takes more time to start 4 processes and wait for them to finish, than to just do the work synchronously in a single process. Use a larger workload to benchmark more realistically.

Comments