Christian Bertram Christian Bertram - 1 year ago 78
Python Question

numpy: is matrix multiplication faster than sum of a vector?

I am implementing liner regression in python, using numpy. My implementation of the square cost function looks like this

square_cost(w, datax, datay):
ndata = datax.shape[1]
return (1/ndata)*np.sum((h(w, datax)-datay)**2)

All parameters are 2-dimensional ndarrays, but datay and the result only have height one.

What I later saw, was this implementation:

square_cost(w, datax, datay):
ndata = datax.shape[1]
diff = h(w, datax)-datay
return (1/ndata)*

I think my first implementation is the clearest, but is the second faster, since it uses dot product?

Answer Source


import numpy as np

np.random.seed([3, 1415])
x = np.random.rand(1000000, 1)
y = np.random.rand(1000000, 1)

diff = x - y

100 loops, best of 3: 3.66 ms per loop

diff = x - y

100 loops, best of 3: 6.85 ms per loop

I'd say yes. Dot product is faster.


For completeness and to address @Eric's concern in the comment wrt the OP's question.

In the regression at hand, the dimension of the endogenous (y) variable is n x 1. Therefore, we can safely assume the second dimension will always have a size of 1.

However, if it were to have a size greater than 1, say m, (and this is definitely possible, just not what the OP needs) then we would be looking at an exogenous (X) of dimension n x k and endogenous (Y) of dimension n x m. This implies a parameter matrix Theta of size k x m. Still totally reasonable. The kicker is that in order to calculate the sum of squared errors we'd be doing (X * Theta - Y) squared and summed. The inner product of (X * Theta - Y) is no longer appropriate as a means to calculate our cost function and its performance is irrelevant. To get something appropriate, we'd reshape (X * Theta - Y) to one dimension then take the inner product. In that case, the same analysis we've done over one dimension is still the most appropriate analysis.

All that said, I ran the following code:

idx = pd.Index(np.arange(1000, 501000, 1000)).sort_values()
ddd = pd.DataFrame(index=idx, columns=['dot', 'dif'])

def fill_ddd(row):
    i =
    x = np.random.rand(i, 1)

    s =
    for _ in range(100):
    row.loc['dot'] = ( - s).total_seconds() / 100.

    s =
    for _ in range(100):
    row.loc['dif'] = ( - s).total_seconds() / 100.

    return row

np.random.seed([3, 1415])

ddd.apply(fill_ddd, axis=1)    


enter image description here

The dot product is the clear winner and much more consistent.