keynesiancross keynesiancross - 6 months ago 10
Python Question

Python Pandas - how to apply boolean series to extract rows from dataframe

I have a boolean series that I got from using .duplicated. I'm trying to figure out what rows of my dataframe are returning True (and what the data is in those rows). How can I use this boolean series to extract those rows?

Thanks - KC

EDIT- Data sample:

level_0 index ID date_time value
54967 54967 54967 1/06/2016 19:30 1.00
54968 54968 54968 1/06/2016 19:30 2.00
54969 54969 54969 1/06/2016 19:43 3.00
54970 54970 54970 1/06/2016 19:46 4.00

Answer

try this:

In [427]: df = pd.DataFrame(np.random.randint(1,5,10), columns=['a'])

In [428]: df
Out[428]:
   a
0  4
1  4
2  4
3  3
4  1
5  3
6  4
7  1
8  3
9  2

In [429]: dups = df.a.duplicated()

In [430]: dups
Out[430]:
0    False
1     True
2     True
3    False
4    False
5     True
6     True
7     True
8     True
9    False
Name: a, dtype: bool

In [431]: df[dups]
Out[431]:
   a
1  4
2  4
5  3
6  4
7  1
8  3