renegade renegade - 23 days ago 6
Python Question

How to remove duplicate rows from an array based on first column.

I have the following situation:

>>> a # I have
array([[0, 1],
[0, 2],
[0, 2],
[1, 3],
[1, 3],
[2, 1]])
>>> new_a # I want to get to
array([[0, 1],
[1, 3],
[2, 1]])


Basically a pure numpy solution on how to remove the entire row IF there are duplicate entries in the first column. For example: The first row is [0, 1], and the second is [0,2] -- Since the 0 (first column) is duplicated, I would like to keep the first instance and remove any other ones.

I'm sure I could set up some If statements and while loops -- but I am wondering if there are more elegant solutions. Thanks!

Answer

Here's one way to do it with np.unique, taking indices of unique items along the first column and then slicing the array along the first axis with the indices:

_, indices = np.unique(arr[:, 0], return_index=True)
print(arr[indices, :])
# [[0 1]
#  [1 3]
#  [2 1]]
Comments