S Ringne S Ringne - 2 days ago 3
Python Question

drop duplicate rows from pandas dataframe where only a part of column's are same

i have a table in pandas dataframe

p_id c_id_x c_id_y
3 13 13
4 45 63
37 21 36
5 13 13
4 15 67
34 21 30


i want to drop rows where c_id_x and c_id_y are same.
(i.e.
3 13 13
and
5 13 13
)

i tried using
df.drop_duplicates()

but i won't work since all the columns are not same.

(p_id) is different.

is there any other way to do it?

Answer

You can use boolean indexing:

mask = (df.c_id_x != df.c_id_y)
print (mask)

0    False
1     True
2     True
3    False
4     True
5     True
dtype: bool

print (df[mask])
   p_id  c_id_x  c_id_y
1     4      45      63
2    37      21      36
4     4      15      67
5    34      21      30

Another solution with ne instead !=:

mask = (df.c_id_x.eq(df.c_id_y))
print (df[mask])
   p_id  c_id_x  c_id_y
1     4      45      63
2    37      21      36
4     4      15      67
5    34      21      30
Comments