Christian Bertram - 2 years ago 120
Python Question

numpy: is matrix multiplication faster than sum of a vector?

I am implementing liner regression in python, using numpy. My implementation of the square cost function looks like this

``````square_cost(w, datax, datay):
ndata = datax.shape[1]
return (1/ndata)*np.sum((h(w, datax)-datay)**2)
``````

All parameters are 2-dimensional ndarrays, but datay and the result only have height one.

``````square_cost(w, datax, datay):
ndata = datax.shape[1]
diff     = h(w, datax)-datay
return (1/ndata)*diff.dot(diff.T)
``````

I think my first implementation is the clearest, but is the second faster, since it uses dot product?

Setup

``````import numpy as np

np.random.seed([3, 1415])
x = np.random.rand(1000000, 1)
y = np.random.rand(1000000, 1)

%%timeit
diff = x - y
diff.T.dot(diff)
``````

100 loops, best of 3: 3.66 ms per loop

``````%%timeit
diff = x - y
np.square(diff).sum()
``````

100 loops, best of 3: 6.85 ms per loop

I'd say yes. Dot product is faster.

EDIT:

For completeness and to address @Eric's concern in the comment wrt the OP's question.

In the regression at hand, the dimension of the endogenous (y) variable is n x 1. Therefore, we can safely assume the second dimension will always have a size of 1.

However, if it were to have a size greater than 1, say m, (and this is definitely possible, just not what the OP needs) then we would be looking at an exogenous (X) of dimension n x k and endogenous (Y) of dimension n x m. This implies a parameter matrix Theta of size k x m. Still totally reasonable. The kicker is that in order to calculate the sum of squared errors we'd be doing (X * Theta - Y) squared and summed. The inner product of (X * Theta - Y) is no longer appropriate as a means to calculate our cost function and its performance is irrelevant. To get something appropriate, we'd reshape (X * Theta - Y) to one dimension then take the inner product. In that case, the same analysis we've done over one dimension is still the most appropriate analysis.

All that said, I ran the following code:

``````idx = pd.Index(np.arange(1000, 501000, 1000)).sort_values()
ddd = pd.DataFrame(index=idx, columns=['dot', 'dif'])

def fill_ddd(row):
i = row.name
x = np.random.rand(i, 1)

s = pd.datetime.now()
for _ in range(100):
x.T.dot(x)
row.loc['dot'] = (pd.datetime.now() - s).total_seconds() / 100.

s = pd.datetime.now()
for _ in range(100):
np.square(x).sum()
row.loc['dif'] = (pd.datetime.now() - s).total_seconds() / 100.

return row

np.random.seed([3, 1415])

ddd.apply(fill_ddd, axis=1)

ddd.plot()
``````

The dot product is the clear winner and much more consistent.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download