Andreas - 6 months ago 46

Python Question

I am working on matrix multiplications in NumPy using np.dot(). As the data set is very large, I would like to reduce the overall run time as far as possible - i.e. perform as little as possible np.dot() products.

Specifically, I need to calculate the overall matrix product as well as the associated flow from each element of my values vector.

Is there a way in NumPy to calculate all of this together in one or two np.dot() products?

In the code below, is there a way to reduce the number of np.dot() products and still get the same output?

`import pandas as pd`

import numpy as np

vector = pd.DataFrame([1, 2, 3],

['A', 'B', 'C'], ["Values"])

matrix = pd.DataFrame([[0.5, 0.4, 0.1],

[0.2, 0.6, 0.2],

[0.1, 0.3, 0.6]],

index = ['A', 'B', 'C'], columns = ['A', 'B', 'C'])

# Can the number of matrix multiplications in this part be reduced?

overall = np.dot(vector.T, matrix)

from_A = np.dot(vector.T * [1,0,0], matrix)

from_B = np.dot(vector.T * [0,1,0], matrix)

from_C = np.dot(vector.T * [0,0,1], matrix)

print("Overall:", overall)

print("From A:", from_A)

print("From B:", from_B)

print("From C:", from_C)

Answer

You could define a `3 x 3`

shaped `2D`

array of those scaling values and perform matrix-multiplication, like so -

```
scale = np.array([[1,0,0],[0,1,0],[0,0,1]])
from_ABC = np.dot(vector.values.ravel()*scale,matrix)
```

Sample run -

```
In [901]: from_A
Out[901]: array([[ 0.5, 0.4, 0.1]])
In [902]: from_B
Out[902]: array([[ 0.9, 1.6, 0.5]])
In [903]: from_C
Out[903]: array([[ 0.8, 1.3, 1.9]])
In [904]: from_ABC
Out[904]:
array([[ 0.5, 0.4, 0.1],
[ 0.9, 1.6, 0.5],
[ 0.8, 1.3, 1.9]])
```

Here's an alternative with `np.einsum`

to do all those in one step -

```
np.einsum('ij,ji,ik->jk',vector.values,scale,matrix)
```

Sample run -

```
In [915]: np.einsum('ij,ji,ik->jk',vector.values,scale,matrix)
Out[915]:
array([[ 0.5, 0.4, 0.1],
[ 0.9, 1.6, 0.5],
[ 0.8, 1.3, 1.9]])
```