Bitwise Bitwise - 3 months ago 11
Python Question

How can one efficiently remove a range of rows from a large numpy array?

Given a large 2d numpy array, I would like to remove a range of rows, say rows

efficiently. I have to do this multiple times with different ranges, so I would like to also make it parallelizable.

Using something like
is not efficient, since it needs to copy the array, taking too much time and memory. Ideally I would want to do something like create a view, but I am not sure how I could do this in this case. A masked array is also not an option since the downstream operations are not supported on masked arrays.

Any ideas?


Because of the strided data structure that defines a numpy array, what you want will not be possible without using a masked array. Your best option might be to use a masked array (or perhaps your own boolean array) to mask the deleted the rows, and then do a single real delete operation of all the rows to be deleted before passing it downstream.