Leon Berkers - 1 year ago 142
Python Question

# Numpy conversion of column values in to row values

I take 3 values of a column (third) and put these values into a row on 3 new columns. And merge the new and old columns into a new matrix A

Input timeseries in col nr3 values in col nr 1 and 2

``````[x x 1]
[x x 2]
[x x 3]
``````

output : matrix A

``````[x x 1 0 0 0]
[x x 2 0 0 0]
[x x 3 1 2 3]
[x x 4 2 3 4]
``````

So for brevity, first the code generates the matrix 6 rows / 3 col. The last column I want to use to fill 3 extra columns and merge it into a new matrix A. This matrix A was prefilled with 2 rows to offset the starting position.

I have implemented this idea in the code below and it takes a really long time to process large data sets.
How to improve the speed of this conversion

``````import  numpy as np

matrix = np.arange(18).reshape((6, 3))

nr=3
A = np.zeros((nr-1,nr))

for x in range( matrix.shape[0]-nr+1):
newrow =  (np.transpose( matrix[x:x+nr,2:3] ))
A = np.vstack([A , newrow])

total= np.column_stack((matrix,A))
print (total)
``````

Here's an approach using `broadcasting` to get those sliding windowed elements and then just some stacking to get `A` -

``````col2 = matrix[:,2]
nrows = col2.size-nr+1
out = np.zeros((nr-1+nrows,nr))
col2_2D = np.take(col2,np.arange(nrows)[:,None] + np.arange(nr))
out[nr-1:] = col2_2D
``````

Here's an efficient alternative using `NumPy strides` to get `col2_2D` -

``````n = col2.strides[0]
col2_2D = np.lib.stride_tricks.as_strided(col2, shape=(nrows,nr), strides=(n,n))
``````

It would be even better to initialize an output array of zeros of the size as `total` and then assign values into it with `col2_2D` and finally with input array `matrix`.

Runtime test

Approaches as functions -

``````def org_app1(matrix,nr):
A = np.zeros((nr-1,nr))
for x in range( matrix.shape[0]-nr+1):
newrow =  (np.transpose( matrix[x:x+nr,2:3] ))
A = np.vstack([A , newrow])
return A

def vect_app1(matrix,nr):
col2 = matrix[:,2]
nrows = col2.size-nr+1
out = np.zeros((nr-1+nrows,nr))
col2_2D = np.take(col2,np.arange(nrows)[:,None] + np.arange(nr))
out[nr-1:] = col2_2D
return out

def vect_app2(matrix,nr):
col2 = matrix[:,2]
nrows = col2.size-nr+1
out = np.zeros((nr-1+nrows,nr))
n = col2.strides[0]
col2_2D = np.lib.stride_tricks.as_strided(col2, \
shape=(nrows,nr), strides=(n,n))
out[nr-1:] = col2_2D
return out
``````

Timings and verification -

``````In [18]: # Setup input array and params
...: matrix = np.arange(1800).reshape((60, 30))
...: nr=3
...:

In [19]: np.allclose(org_app1(matrix,nr),vect_app1(matrix,nr))
Out[19]: True

In [20]: np.allclose(org_app1(matrix,nr),vect_app2(matrix,nr))
Out[20]: True

In [21]: %timeit org_app1(matrix,nr)
1000 loops, best of 3: 646 µs per loop

In [22]: %timeit vect_app1(matrix,nr)
10000 loops, best of 3: 20.6 µs per loop

In [23]: %timeit vect_app2(matrix,nr)
10000 loops, best of 3: 21.5 µs per loop

In [28]: # Setup input array and params
...: matrix = np.arange(7200).reshape((120, 60))
...: nr=30
...:

In [29]: %timeit org_app1(matrix,nr)
1000 loops, best of 3: 1.19 ms per loop

In [30]: %timeit vect_app1(matrix,nr)
10000 loops, best of 3: 45 µs per loop

In [31]: %timeit vect_app2(matrix,nr)
10000 loops, best of 3: 27.2 µs per loop
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download