HalfPintBoy HalfPintBoy - 9 days ago 6
SQL Question

Pandas analogue of JOIN with WHERE clause

I'm doing joining of two dataframe (A and B) in python's pandas.

The goal is to receive all the pure rows from B (sql analogue- right join B on A.client_id=B.client_id where A.client_id is null)

In pandas all I know for this operation is to do merging but I don't know how to set up the conditions (where clause):

x=pd.merge(A,B,how='right',on=['client_id','client_id']


Thanks!

Answer

option 1
indicator=True

A.merge(B, on='client_id', how='right', indicator=True) \
    .query('_merge == "right_only"').drop('_merge', 1)

setup

A = pd.DataFrame(dict(client_id=[1, 2, 3], valueA=[4, 5, 6]))
B = pd.DataFrame(dict(client_id=[3, 4, 5], valueB=[7, 8, 9]))

results

enter image description here

more explanation
indicator=True puts another column in the results of the merge that indicates whether that rows results are from the left, right, or both.

A.merge(B, on='client_id', how='outer', indicator=True)

enter image description here

So, I just use query to filter out the right_only indicator then drop that column.


option 2
not really a merge. You can use query again to only pull rows of B where its 'client_id's are not in A

B.query('client_id not in @A.client_id')

or an equivalent way of saying the same thing (but faster)

B[~B.client_id.isin(A.client_id)]

enter image description here