I'm doing joining of two dataframe (A and B) in python's pandas.
The goal is to receive all the pure rows from B (sql analogue- right join B on A.client_id=B.client_id where A.client_id is null)
In pandas all I know for this operation is to do merging but I don't know how to set up the conditions (where clause):
A.merge(B, on='client_id', how='right', indicator=True) \ .query('_merge == "right_only"').drop('_merge', 1)
A = pd.DataFrame(dict(client_id=[1, 2, 3], valueA=[4, 5, 6])) B = pd.DataFrame(dict(client_id=[3, 4, 5], valueB=[7, 8, 9]))
indicator=True puts another column in the results of the merge that indicates whether that rows results are from the left, right, or both.
A.merge(B, on='client_id', how='outer', indicator=True)
So, I just use
query to filter out the
right_only indicator then drop that column.
not really a merge. You can use
query again to only pull rows of
B where its
'client_id's are not in
B.query('client_id not in @A.client_id')
or an equivalent way of saying the same thing (but faster)