Gbusch15 Gbusch15 - 8 months ago 36
Python Question

Python Pandas Dataframes comparison on 2 columns (with where clause)

I'm stuck on particluar python question here. I have 2 dataframes DF1 and DF2. In both, I have 2 columns pID and yID (which are not indexed, just default). I'm look to add a column Found in DF1 where the respective values of columns (pID and yID) were found in DF2. Also, I would like to zone in on just values in DF2 where aID == 'Text'.

I believe the below gets me the 1st part of this question; however, I'm unsure how as to incorporate the where.

DF1['Found'] = (DF1[['pID', 'yID']] == DF2[['pID','yID']]).all(axis=1).astype(bool)


Suggestions or answers would be most appreciated. Thanks.

Answer Source

You could subset the second dataframe containing aID == 'Text' to get a reduced DF from which select those portions of columns to be compared against the first dataframe.

Use DF.isin() to check if the values that are present under these column names match or not. And, .all(axis=1) returns True if both the columns happen to be True, else they become False. Convert the boolean series to integers via astype(int) and assign the result to the new column, Found.

df1_sub = df1[['pID', 'yID']]
df2_sub = df2.query('aID=="Text"')[['pID', 'yID']]
df1['Found'] = df1_sub.isin(df2_sub).all(axis=1).astype(int)
df1

enter image description here

Demo DF's used:

df1 = pd.DataFrame(dict(pID=[1,2,3,4,5],
                       yID=[10,20,30,40,50]))

df2 = pd.DataFrame(dict(pID=[1,2,8,4,5],
                       yID=[10,12,30,40,50],
                       aID=['Text','Best','Text','Best','Text']))