Michael Perdue Michael Perdue - 4 months ago 9
Python Question

Pandas: remove observations from a data frame based on mulitple columns

I have a data frame and I need to delete certain observations from it based upon the values of other columns.

cid date unemployment billsum id.thomas loansum feccanid amtsum
N00003147 2005.0 5.6 1.0 1336 2.0 S4TN00153 4.500
N00009082 2007.0 3.7 1.0 11 2.0 S6CO00168 13.000
N00013870 2007.0 4.6 3.0 1697 17.5 S2MN00126 1636.709
N00002091 2007.0 3.1 1.0 246 11.5 S0ID00057 238.795
N00006515 2007.0 3.8 2.0 1319 49.5 S8NM00010 966.286


I would like, e.g. to remove values for
id.thomas == 1763
only when
date == 2008 through 2012
(my date range is 2005-14).
I have tried:

bill_amtmerge = bill_amtmerge[bill_amtmerge['id.thomas']!= 1763 & (bill_amtmerge['date'] > 2007)]


Does someone have an idea on this?

Answer

try this:

mask = (df['id.thomas'] == 1763) & (df['date'] >= 2008) & (df['date'] <= 2012)
df = df[~mask]

alternatively you can negate your condition:

df = df[(df['id.thomas'] != 1763) | (df['date'] < 2008) | (df['date'] > 2012)]
Comments