KrunalParmar - 2 months ago 6
Python Question

# Take Unique of numpy array according to 2 column values.

I have Numpy array in python with two columns as follows :

``````time,id
1,a
2,b
3,a
1,a
5,c
6,b
3,a
``````

i want to take unique time of each user.
For above data i want below output.

``````time,id
1,a
2,b
3,a
5,c
6,b
``````

That is, I want to take only unique rows. so, 1,a and 3,a will not repeat in the result.
I have both the column as string datatype and have a very large 2-D array.
one solution may be, i can iterate over all the rows and make a set. But that will be very slow. Please suggest an efficient way to implement it.

Given:

``````>>> b
[['1' 'a']
['2' 'b']
['3' 'a']
['1' 'a']
['5' 'c']
['6' 'b']
['3' 'a']]
``````

You can do:

``````>>> np.vstack({tuple(e) for e in b})
[['3' 'a']
['1' 'a']
['2' 'b']
['6' 'b']
['5' 'c']]
``````

Since that is a set comprehension, you loose the order of the original.

Or, to maintain order, you can do:

``````>>> c = np.ascontiguousarray(b).view(np.dtype((np.void, b.dtype.itemsize * b.shape[1])))
>>> b[np.unique(c, return_index=True)[1]]
[['1' 'a']
['2' 'b']
['3' 'a']
['5' 'c']
['6' 'b']]
``````

Or, if you can use Pandas, this is really easy. Given the following DataFrame:

``````>>> df
id  time
0  a     1
1  b     2
2  a     3
3  a     1
4  c     5
5  b     6
6  a     3
``````

Just use `drop_duplicates()`:

``````>>> df.drop_duplicates()
id  time
0  a     1
1  b     2
2  a     3
4  c     5
5  b     6
``````