Roman Roman - 4 months ago 87
Python Question

How to subtract rows of one pandas data frame from another?

The operation that I want to do is similar to merger. For example, with the

inner
merger we get a data frame that contains rows that are present in the first AND second data frame. With the
outer
merger we get a data frame that are present EITHER in the first OR in the second data frame.

What I need is a data frame that contains rows that are present in the first data frame AND NOT present in the second one? Is there a fast and elegant way to do it?

Answer

How about something like the following?

print df1

    Team  Year  foo
0   Hawks  2001    5
1   Hawks  2004    4
2    Nets  1987    3
3    Nets  1988    6
4    Nets  2001    8
5    Nets  2000   10
6    Heat  2004    6
7  Pacers  2003   12

print df2

    Team  Year  foo
0  Pacers  2003   12
1    Heat  2004    6
2    Nets  1988    6

As long as there is a non-key commonly named column, you can let the added on sufffexes do the work (if there is no non-key common column then you could create one to use temporarily ... df1['common'] = 1 and df2['common'] = 1):

new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.foo_y.isnull()]

     Team  Year  foo_x  foo_y
0  Hawks  2001      5    NaN
1  Hawks  2004      4    NaN
2   Nets  1987      3    NaN
4   Nets  2001      8    NaN
5   Nets  2000     10    NaN

Or you can use isin but you would have to create a single key:

df1['key'] = df1['Team'] + df1['Year'].astype(str)
df2['key'] = df1['Team'] + df2['Year'].astype(str)
print df1[~df1.key.isin(df2.key)]

     Team  Year  foo         key
0   Hawks  2001    5   Hawks2001
2    Nets  1987    3    Nets1987
4    Nets  2001    8    Nets2001
5    Nets  2000   10    Nets2000
6    Heat  2004    6    Heat2004
7  Pacers  2003   12  Pacers2003