ryanjdillon ryanjdillon - 3 years ago 132
Python Question

Subsample 1-D array using 2-D indices in numpy

Background:
The data I'm using is being extracted from a

netCDF4
object, which creates a numpy masked array at initialization, but does not appear to support the numpy
reshape()
method, making it only possible to reshape after all the data has been copied = way too slow.

Question: How can I sub-sample a 1-D array, that is basically a flattened 2-D array, without reshaping it?

import numpy

a1 = np.array([[1,2,3,4],
[11,22,33,44],
[111,222,333,444],
[1111,2222,3333,4444],
[11111,22222,33333,44444]])

a2 = np.ravel(a1)

rows, cols = a1.shape

row1 = 1
row2 = 3

col1 = 1
col2 = 3


I would like to use a fast slicing method that doesn't require reshaping the 1-D array to a 2-D array.

Desired Output:

np.ravel(a1[row1:row2, col1:col2])

>> array([ 22, 33, 222, 333])


I got as far as getting the start and ending positions, but this just selects ALL data between these points (i.e. extra columns).

idx_start = (row1 * cols) + col1
idx_end = (row2 * cols) + col2


Update:
I just tried Jaime's brilliant answer, but it appears that
netCDF4
won't allow for 2-D indices.

z = dataset.variables["z"][idx]
File "netCDF4.pyx", line 2613, in netCDF4.Variable.__getitem__ (netCDF4.c:29583)
File "/usr/local/lib/python2.7/dist-packages/netCDF4_utils.py", line 141, in _StartCountStride
raise IndexError("Index cannot be multidimensional.")
IndexError: Index cannot be multidimensional.

Answer Source

I came up with this, and though it doesn't copy ALL of the data, it is still copying data that I don't want into memory. This can probably be improved and I hope there is a better solution out there.

zi = 0 
# Create zero array with the appropriate length for the data subset
z = np.zeros((col2 - col1) * (row2 - row1))
# Process number of rows for which data is being extracted
for i in range(row2 - row1):
    # Pull row, then desired elements of that row into buffer
    tmp = ((dataset.variables["z"][(i*cols):((i*cols)+cols)])[col1:col2])
    # Add each item in buffer sequentially to data array
    for j in tmp:
        z[zi] = j 
        # Keep a count of what index position the next data point goes to
        zi += 1
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download