rxu - 2 months ago 12
Python Question

# How did numpy implement multi-dimensional broadcasting?

Memory (row major order):

``````[[A(0,0), A(0,1)]
[A(1,0), A(1,1)]]

has this memory layout:
[A(0,0), A(0,1), A(1,0), A(1,1)]
``````

I guess the algorithm work like this in the following cases.

``````[[0, 1, 2, 3]         [[1]
x
[4, 5, 6, 7]]         [10]]

A (2 by 4)            B (2 by 1)

Iterate 0th dimensions of A and B simultaneously {
Iterate last dimension of A{
multiply;
}
}
``````

``````[[0, 1, 2, 3]
x    [[1,10,100,1000]]
[4, 5, 6, 7]]

A (2 by 4)              B (1 by 4)

Iterate 0th dimension of A{
Iterate 1st dimensions of A and B simultaneously{
multiply;
}
}
``````

Question:

1. How does numpy know which order of multiplication is the best.
(reading memory in order is better than reading memory all over the place. but how did numpy figure that out?)

2. What would numpy do if the arrays have more than two dimension

3. What would numpy do if the broadcasting dimension is not the last dimension?

To really get into broadcasting details you need to understand array shape and strides. But a lot of the work is now implemented in `c` code using `nditer`. You can read about it at http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html. `np.nditer` gives you access to the tool at the Python level, but its real value comes when used with `cython` or your own `c` code.

`np.lib.stride_tricks` has functions that let you play with strides. One of its functions helps visualize how arrays are broadcasted together. In practice the work is done with `nditer`, but this function may help understand the action:

``````In [629]: np.lib.stride_tricks.broadcast_arrays(np.arange(6).reshape(2,3),
np.array([[1],[2]]))
Out[629]:
[array([[0, 1, 2],
[3, 4, 5]]),
array([[1, 1, 1],
[2, 2, 2]])]
``````

Note that, effectively the 2nd array has been replicated to match the 1st's shape. But the replication is done with stride tricks, not with actual copies.

``````In [631]: A,B=np.lib.stride_tricks.broadcast_arrays(np.arange(6).reshape(2,3),
np.array([[1],[2]]))
In [632]: A.shape
Out[632]: (2, 3)
In [633]: A.strides
Out[633]: (12, 4)
In [634]: B.shape
Out[634]: (2, 3)
In [635]: B.strides
Out[635]: (4, 0)
``````

It's this `(4,0)` strides that does the replication without copy.

=================

Using the python level `nditer`, here's what it does during broadcasting.

``````In [1]: A=np.arange(6).reshape(2,3)
In [2]: B=np.array([[1],[2]])
``````

The plain nditer feeds elements one set at a time http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#using-an-external-loop

``````In [5]: it =np.nditer((A,B))
In [6]: for a,b in it:
...:     print(a,b)

0 1
1 1
2 1
3 2
4 2
5 2
``````

But when I turn `extenal_loop` on, it iterates in chunks, here respective rows of the broadcasted arrays:

``````In [7]: it =np.nditer((A,B), flags=['external_loop'])
In [8]: for a,b in it:
...:     print(a,b)

[0 1 2] [1 1 1]
[3 4 5] [2 2 2]
``````

With a more complex broadcasting the `external_loop` still produces 1d arrays that allow simple `c` style iteration:

``````In [13]: A1=np.arange(24).reshape(3,2,4)
In [18]: it =np.nditer((A1,np.arange(3)[:,None,None]), flags=['external_loop'])
In [19]: while not it.finished:
...:     print(it[:])
...:     it.iternext()
...:
(array([0, 1, 2, 3, 4, 5, 6, 7]), array([0, 0, 0, 0, 0, 0, 0, 0]))
(array([ 8,  9, 10, 11, 12, 13, 14, 15]), array([1, 1, 1, 1, 1, 1, 1, 1]))
(array([16, 17, 18, 19, 20, 21, 22, 23]), array([2, 2, 2, 2, 2, 2, 2, 2]))
``````

Note that while `A1` is (3,2,4), the nditer loop yields 3 steps (1st axis) with 2*4 length elements.

I found in another `cython/nditer` SO question that the first approach did not produce much of a speed improvement, but the 2nd helped a lot. In `c` or `cython` the `external_loop` case would do simple low level iteration.