John Laudun John Laudun - 3 years ago 159
Python Question

Matching data across pandas dataframes

I have two dataframes, one which has authors and their texts -- there are other columns as well -- and another which has authors and their genders and disciplines.

DF1
====================================
author date text
------------------------------------
a1 2006 "Thank you for..."
a2 2007 "When I was asked..."
a3 2014 "Biology is the ..."
a2 2010 "In the intervening..."

DF2
====================================
author gender discipline
------------------------------------
a2 male psychologist
a1 female neurologist
a3 female biologist


I am scrubbing through the
pandas
documentation and searching SO and other sites trying to see how I can match authors in DF1 with their genders in DF2. I don't care if I do it inplace in DF1 or if I need to create a new dataframe, so long as the new data frame has all the information in DF1 plus the additional information, gender and/or discipline, from DF2.

I don't even have the beginnings of code here -- I just finished scrubbing DF2 of all kinds of unicode errors, so I'm a bit at wit's end at this point in the day.

Answer Source

Option 1
pd.DataFrame.merge

DF1.merge(DF2[['author', 'gender']], 'left')

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

Option 2
pd.Series.map

d = dict(DF2[['author', 'gender']].values)
DF1.assign(gender=DF1.author.map(d))

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

Option 2.1
other ways to make the d

d = DF2.set_index('author').gender
DF1.assign(gender=DF1.author.map(d))

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

Option 2.2
other ways to make the d

d = dict(zip(DF2.author, DF2.gender))
DF1.assign(gender=DF1.author.map(d))

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

Option 3
pd.DataFrame.join

DF1.join(DF2.set_index('author').gender, on='author')

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download