Daniel F - 3 years ago 126
Python Question

# Keep first element of array that is part of a sequence

Say I have this type of array

``````y

array([299839, 667136, 665420, 665418, 665421, 667135, 299799, 665419, 667137, 299800])
``````

as the result of a "top 10"
`argpartition`
:

``````y = np.argpartiton(-x, np.arange(10))[:10]
``````

Now, I want to remove the elements that are sequential, only keeping the first (maximum) element in the series such that:

``````y_new
array([299839, 667136, 665420, 299799])
``````

But while that seems like it should be simple I'm not seeing an efficient way to do it (or even a good way to start). Assume the real-world application will do the top 1000 or so and need to do it many times.

Here's one approach based on sorting -

``````# Get the sorted indices
sidx = y.argsort()

# Get sorted array
ys = y[sidx]

# Get indices at which islands of sequential numbers start/stop
cut_idx = np.flatnonzero(np.concatenate(([True], np.diff(ys)!=1 )))

# Finally get the minimum indices for each island and then index into
# input for the desired output
y_new = y[np.minimum.reduceat(sidx, cut_idx)]
``````

If you would like to keep the order of elements in the output, sort the indices and then index at the last step -

``````y[np.sort(np.minimum.reduceat(sidx, cut_idx))]
``````

Sample input, output -

``````In [56]: y
Out[56]:
array([299839, 667136, 665420, 665418, 665421, 667135, 299799, 665419,
667137, 299800])

In [57]: y_new
Out[57]: array([299799, 299839, 665420, 667136])

In [58]: y[np.sort(np.minimum.reduceat(sidx, cut_idx))]
Out[58]: array([299839, 667136, 665420, 299799])
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download