c3cris - 4 months ago 19
Python Question

``````def dowork():
y = []
z = []
ab = 0
start_time = time.time()

for x in range(0,1500):
y.append(random.randint(0,100000))
for x in range(0,1500):
z.append(random.randint(0,1000))
for x in range(0,100):
for k in range(0,len(z)):
ab += y[k] ** z[k]
print(" %.50s..." % ab)
print("--- %.6s seconds --- %s" % (time.time() - start_time, t.name))

#do the work!
for x in range(0,4): #4 threads

x.start() # and they are off
``````

Results:

`````` 23949968699026357507152486869104218631097704347109...
10632599432628604090664113776561125984322566079319...
20488842520966388603734530904324501550532057464424...
17247910051860808132548857670360685101748752056479...
[Finished in 12.2s]
``````

And now let's do it in 1 thread:

``````def dowork():
y = []
z = []
ab = 0
start_time = time.time()

for x in range(0,1500):
y.append(random.randint(0,100000))
for x in range(0,1500):
z.append(random.randint(0,1000))
for x in range(0,100):
for k in range(0,len(z)):
ab += y[k] ** z[k]
print(" %.50s..." % ab)
print("--- %.6s seconds --- %s" % (time.time() - start_time, t.name))

for x in range(0,4):

dowork()
``````

Results:

`````` 14283744921265630410246013584722456869128720814937...
13487957813644386002497605118558198407322675045349...
15058500261169362071147461573764693796710045625582...
77481355564746169357229771752308217188584725215300...
[Finished in 11.1s]
``````

Why is single threaded and multi-threaded scripts have the same processing time?
Shouldn't the multi-threaded implementation only be 1/#number of threads less? (I know when you reach your max cpu threads there is diminishing returns)

Did I mess up my implementation?

Multithreading in Python does not work like other languages, it has something to do with the global interpreter lock if I recalled correctly. There are a lot of different workarounds though, for example you can use gevent's coroutine based "threads". I myself prefer dask for work that needs to run concurrently. For example

``````import dask.bag as db
start = time.time()
(db.from_sequence(range(4), npartitions=4)
.map(lambda _: dowork())
.compute())
print('total time: {} seconds'.format(time.time() - start))

start = time.time()
for x in range(0,4):

dowork()
print('total time: {} seconds'.format(time.time() - start))
``````

and the output

`````` 19016975777667561989667836343447216065093401859905...
32883203981076692018141849036349126447899294175228...
34450410116136243300565747102093690912732970152596...
50964938446237359434550325092232546411362261338846...
total time: 2.5557193756103516 seconds
10380860937556820815021239635380958917582122217407...
13309313630078624428079401365574221411759423165825...
In this case dask uses `multiprocessing` to do the work, which may or may not be desireable for your case.