Bitwise Bitwise - 4 months ago 13
Python Question

How can one efficiently remove a range of rows from a large numpy array?

Given a large 2d numpy array, I would like to remove a range of rows, say rows

10000:10010
efficiently. I have to do this multiple times with different ranges, so I would like to also make it parallelizable.

Using something like
numpy.delete()
is not efficient, since it needs to copy the array, taking too much time and memory. Ideally I would want to do something like create a view, but I am not sure how I could do this in this case. A masked array is also not an option since the downstream operations are not supported on masked arrays.

Any ideas?

Answer

Because of the strided data structure that defines a numpy array, what you want will not be possible without using a masked array. Your best option might be to use a masked array (or perhaps your own boolean array) to mask the deleted the rows, and then do a single real delete operation of all the rows to be deleted before passing it downstream.

Comments