jab jab - 3 months ago 19
Python Question

Pandas Drop Very First Duplicate only

Let's say I have the following series.

s = pandas.Series([0, 1, 2, 3, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7])


I can keep the first duplicate (for each duplicate value) of the series with the following

s[s.duplicated(keep='first')]


I can keep the last duplicate (for each duplicate value) of the series with the following

s[s.duplicated(keep='last')]


However, I'm looking to do the following.


  1. Drop only the very first duplicate, keep the other duplicates of that matching value, but also keep all other duplicates of varying values (including the first ones of each group). In the example above, we'd drop the first
    3
    , but keep the other
    3's
    . Keep all other remaining duplicates.

  2. Keep the first duplicate, drop the duplicates that matching value, but also keep all the other duplicates of other varying values. In the example above, we'd keep the first
    3
    , but drop all other
    3's
    . Keep all other remaining duplicates.



I've been racking my brain using
cumsum()
and
diff()
to capture the change when a duplicate has been detected. I imagine a solution would involve this, but I can't seem to get a perfect solution. I've gone through too many truth tables right now...

Answer Source
ind = s[s.duplicated()].index[0]

gives you the first index where a record is duplicated. Use it to drop.

In [45]: s.drop(ind)
Out[45]:
0     0
1     1
2     2
4     3
5     3
6     3
7     4
8     5
9     6
10    6
11    6
12    7
13    7
dtype: int64