roffster roffster - 4 months ago 12
Python Question

Splitting duplicates into separate table - Pandas

In Pandas, I can drop duplicate rows inside a database based on a single column using the

data.drop_duplicates('foo')


command. I'm wondering if there is a way to catch this data in another table for independent review.

Answer

You can call the duplicated method on the foo column and then subset your original data frame based on it, something like this:

data.loc[data['foo'].duplicated(), :]

As an example:

data = pd.DataFrame({'foo': [1,1,1,2,2,2], 'bar': [1,1,2,2,3,3]})    
data

# bar foo
#0  1   1
#1  1   1
#2  2   1
#3  2   2
#4  3   2
#5  3   2


data.loc[data['foo'].duplicated(), :]
# bar foo
#1  1   1
#2  2   1
#4  3   2
#5  3   2