Yariv Yariv - 2 months ago 11
Python Question

How to find duplicate names using pandas?

I have a

pandas.DataFrame
with a column called
name
containing strings.
I would like to get a list of the names which occur more than once in the column. How do I do that?

I tried:

funcs_groups = funcs.groupby(funcs.name)
funcs_groups[(funcs_groups.count().name>1)]


But it doesn't filter out the singleton names.

Answer

If you want to find the rows with duplicated name (except the first time we see that), you can try this

In [16]: import pandas as pd
In [17]: p1 = {'name': 'willy', 'age': 10}
In [18]: p2 = {'name': 'willy', 'age': 11}
In [19]: p3 = {'name': 'zoe', 'age': 10}
In [20]: df = pd.DataFrame([p1, p2, p3])

In [21]: df
Out[21]: 
   age   name
0   10  willy
1   11  willy
2   10    zoe

In [22]: df.duplicated('name')
Out[22]: 
0    False
1     True
2    False