Daniel F Daniel F - 3 years ago 126
Python Question

Keep first element of array that is part of a sequence

Say I have this type of array

y

array([299839, 667136, 665420, 665418, 665421, 667135, 299799, 665419, 667137, 299800])


as the result of a "top 10"
argpartition
:

y = np.argpartiton(-x, np.arange(10))[:10]


Now, I want to remove the elements that are sequential, only keeping the first (maximum) element in the series such that:

y_new
array([299839, 667136, 665420, 299799])


But while that seems like it should be simple I'm not seeing an efficient way to do it (or even a good way to start). Assume the real-world application will do the top 1000 or so and need to do it many times.

Answer Source

Here's one approach based on sorting -

# Get the sorted indices
sidx = y.argsort()

# Get sorted array
ys = y[sidx]

# Get indices at which islands of sequential numbers start/stop
cut_idx = np.flatnonzero(np.concatenate(([True], np.diff(ys)!=1 )))

# Finally get the minimum indices for each island and then index into
# input for the desired output
y_new = y[np.minimum.reduceat(sidx, cut_idx)]

If you would like to keep the order of elements in the output, sort the indices and then index at the last step -

y[np.sort(np.minimum.reduceat(sidx, cut_idx))]

Sample input, output -

In [56]: y
Out[56]: 
array([299839, 667136, 665420, 665418, 665421, 667135, 299799, 665419,
       667137, 299800])

In [57]: y_new
Out[57]: array([299799, 299839, 665420, 667136])

In [58]: y[np.sort(np.minimum.reduceat(sidx, cut_idx))]
Out[58]: array([299839, 667136, 665420, 299799])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download