Daniel Messias Daniel Messias - 1 month ago 6
Python Question

Drop duplicates for rows with interchangeable name values (Pandas, Python)

I have a DataFrame of form

person1, person2, ..., someMetric
John, Steve, ..., 20
Peter, Larry, ..., 12
Steve, John, ..., 20


Rows 0 and 2 are interchangeable duplicates, so I'd want to drop the last row. I can't figure out how to do this in Pandas.

Thanks!

Answer

Here's a NumPy based solution -

df[~(np.triu(df.person1.values[:,None] == df.person2.values)).any(0)]

Sample run -

In [123]: df
Out[123]: 
  person1 person2 someMetric
0    John   Steve         20
1   Peter   Larry         13
2   Steve    John         19
3   Peter  Parker          5
4   Larry   Peter          7

In [124]: df[~(np.triu(df.person1.values[:,None] == df.person2.values)).any(0)]
Out[124]: 
  person1 person2 someMetric
0    John   Steve         20
1   Peter   Larry         13
3   Peter  Parker          5