JanM JanM - 3 months ago 17x
Python Question

Cannot replicate results comparing Python, Numpy and Numba matrix multiplication

So while evaluating possibilities to speed up Python code i came across this Stack Overflow post: Comparing Python, Numpy, Numba and C++ for matrix multiplication

I was quite impressed with numba's performance and implemented some of our function in numba. Unfortunately the speedup was only there for very small matrices and for large matrices the code became very slow compared to the previous scipy sparse implementation. I thought this made sense but nevertheless i repeated the test in the original post (code below).

When using a 1000 x 1000 matrix, according to that post even the python implementation should take roughly 0,01 s. Here's my results though:

python : 769.6387 seconds

numpy : 0.0660 seconds

numba : 3.0779 seconds

scipy : 0.0030 seconds

What am i doing wrong to get such different results than the original post? I copied the functions and did not change anything. I tried both Python 3.5.1 (64 bit) and Python 2.7.10 (32 bit), a colleague tried the same code with the same results. This is the result for a 100x100 matrix:

python : 0.6916 seconds

numpy : 0.0035 seconds

numba : 0.0015 seconds

scipy : 0.0035 seconds

Did i make some obvious mistakes?

import numpy as np
import numba as nb
import scipy.sparse
import time

class benchmark(object):
def __init__(self, name):
self.name = name

def __enter__(self):
self.start = time.time()

def __exit__(self, ty, val, tb):
end = time.time()
print("%s : %0.4f seconds" % (self.name, end-self.start))
return False

def dot_py(A, B):
m, n = A.shape
p = B.shape[1]

C = np.zeros((m, p))

for i in range(0, m):
for j in range(0, p):
for k in range(0, n):
C[i, j] += A[i, k] * B[k, j]
return C

def dot_np(A, B):
C = np.dot(A,B)
return C

def dot_scipy(A, B):
C = A * B
return C

dot_nb = nb.jit(nb.float64[:,:](nb.float64[:,:], nb.float64[:,:]), nopython=True)(dot_py)

dim_x = 1000
dim_y = 1000
a = scipy.sparse.rand(dim_x, dim_y, density=0.01)
b = scipy.sparse.rand(dim_x, dim_y, density=0.01)
a_full = a.toarray()
b_full = b.toarray()

print("starting test")

with benchmark("python"):
dot_py(a_full, b_full)

with benchmark("numpy"):
dot_np(a_full, b_full)

with benchmark("numba"):
dot_nb(a_full, b_full)

with benchmark("scipy"):
dot_scipy(a, b)

print("finishing test")


for anyone seeing this at a later time. this is the results i got when using sparse nxn matrices (1% of elements are nonzero).
runtime of nxn matrix multiplication


In the linked stackoverflow question where you got the code from, m = n = 3 and p is variable, whereas you are using m = n = 1000, which is going to make a huge difference in the timings.