unasalusvictis unasalusvictis - 20 days ago 10
Python Question

'subset' not working for drop_duplicates pandas dataframe

I have a df that looks like this:

A B C D NEW
0 1 Adhoc_Task WID WI_DTL []
1 1 Arun_adhoc_load ATT IXN_1 (IXN,)
2 1 Arun_adhoc_load ATT IXN_10 (IXN,)
3 1 Arun_adhoc_load ATT IXN_100 (IXN,)
4 1 Arun_adhoc_load ATT IXN_101 (IXN,)
5 2 Batch_Support ATT CDS_STATUS []
6 2 Batch_Support ATT CDS_CONTROL []
7 2 Batch_Support ATT CDS_ORA_STATUS []
8 2 Batch_Support ATT REP_FILTER []
9 1 online_load ATT TAX_3 (TAX,)
10 1 online_load ATT TAX_4 (TAX,)
11 1 online_load ATT TAX_8 (TAX,)
12 1 online_load ATT TAX_11 (TAX,)


Desired output would look like this:

A B C D NEW
0 1 Adhoc_Task WID WI_DTL []
1 1 Arun_adhoc_load ATT IXN_1 (IXN,)
5 2 Batch_Support ATT CDS_STATUS []
9 1 online_load ATT TAX_3 (TAX,)


I'm trying to drop duplicate rows based off column B. However, when I run

df.drop_duplicates(subset = ['B'], keep='first', inplace=True)


I get the following error:

TypeError: drop_duplicates() got an unexpected keyword argument 'subset'


I'm running pandas 0.19.1 from python 3, so I took a look at the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

I haven't the foggiest of what I'm doing wrong with
subset
. How would I drop duplicates from the DataFrame based off the values in one column?

Answer

For whatever reason in your code, df became a Series object. Check type(df) just before the failing drop_duplicates call. That function has no subset argument for the Series.