TTaa TTaa - 3 years ago 337
Python Question

Python pandas dataframe fillna with column in different dataframe

I want to fill Nan values in 'gvkey' column with another column values in different dataframe.

df
wrds_id isin_code gvkey gvkey_new
0 1004 US0003611052 1004.0 1004.0
1 1005 US1948302047 Nan 1000.0
2 1006 US1948302047 Nan 1004.0
3 1007 US0309541011 Nan 1004.0
4 1007 US0003611052 1004.0 1004.0
5 1008 IL0006046119 Nan 1004.0
6 1008 US0003611052 1004.0 1004.0
7 1009 US4448591028 Nan 1004.0
8 1004 US4448591028 Nan 1004.0
9 1004 US4448591028 Nan 1004.0
10 1013 US0008861017 1013.0 1013.0
11 1013 BE0003755692 Nan 1013.0
12 1013 BE0003755692 Nan 1013.0


Using this frame, based on isin_code values, replace Nan value with gvkey_ciq_new values in first dataframe

df1
isin_code gvkey_ciq_new
0 US0309541011 1485.0
1 IL0006046119 2018.0
3 US1948302047 3176.0
4 US2376881064 3760.0
5 BE0003755692 5150.0
7 US4448591028 5776.0
9 GB0004544929 5898.0


Final dataframe I want to make,

finaldf
wrds_id isin_imp gvkey gvkey_new
0 1004 US0003611052 1004.0 1004.0
1 1005 US1948302047 3176.0 1004.0
2 1006 US1948302047 3176.0 1004.0
3 1007 US0309541011 1485.0 1004.0
4 1007 US0003611052 1004.0 1004.0
5 1008 IL0006046119 2018.0 1004.0
6 1008 US0003611052 1004.0 1004.0
7 1009 US4448591028 5776.0 1004.0
8 1004 US4448591028 5776.0 1004.0
9 1004 US4448591028 5776.0 1004.0
10 1013 US0008861017 1013.0 1013.0
11 1013 BE0003755692 5150.0 1013.0
12 1013 BE0003755692 5150.0 1013.0


How can I use map function to make the dataframe?

Answer Source

Don't use any loops. Merge the dataframes and use numpy.where with pandas.inull

Set things up

from io import StringIO

import numpy
import pandas

d1 = StringIO("""\
     wrds_id    isin_code   gvkey   gvkey_new
 0  1004    US0003611052    1004.0  1004.0
 1  1005    US1948302047    Nan     1000.0
 2  1006    US1948302047    Nan     1004.0
 3  1007    US0309541011    Nan     1004.0
 4  1007    US0003611052    1004.0  1004.0
 5  1008    IL0006046119    Nan     1004.0
 6  1008    US0003611052    1004.0  1004.0
 7  1009    US4448591028    Nan     1004.0
 8  1004    US4448591028    Nan     1004.0
 9  1004    US4448591028    Nan     1004.0
 10 1013    US0008861017    1013.0  1013.0
 11 1013    BE0003755692    Nan     1013.0
 12 1013    BE0003755692    Nan     1013.0
 """)

d2 = StringIO("""\
    isin_code   gvkey_ciq_new
0   US0309541011    1485.0
1   IL0006046119    2018.0
3   US1948302047    3176.0
4   US2376881064    3760.0
5   BE0003755692    5150.0
7   US4448591028    5776.0
9   GB0004544929    5898.0
""")
df1 = pandas.read_table(d1, sep='\s+', na_values=['Nan'])
df2 = pandas.read_table(d2, sep='\s+', na_values=['Nan'])

Merge and compute the final column

result = (
    df1.merge(df2, on='isin_code', how='left')
       .assign(gvkey=lambda df: numpy.where(
           pandas.isnull(df['gvkey']),  # condition
           df['gvkey_ciq_new'],         # values if true
           df['gvkey']                  # values if false
       ))[df1.columns]                  # only keep original columns
)

print(result)

And that gives me:

    wrds_id     isin_code   gvkey  gvkey_new
0      1004  US0003611052  1004.0     1004.0
1      1005  US1948302047  3176.0     1000.0
2      1006  US1948302047  3176.0     1004.0
3      1007  US0309541011  1485.0     1004.0
4      1007  US0003611052  1004.0     1004.0
5      1008  IL0006046119  2018.0     1004.0
6      1008  US0003611052  1004.0     1004.0
7      1009  US4448591028  5776.0     1004.0
8      1004  US4448591028  5776.0     1004.0
9      1004  US4448591028  5776.0     1004.0
10     1013  US0008861017  1013.0     1013.0
11     1013  BE0003755692  5150.0     1013.0
12     1013  BE0003755692  5150.0     1013.0
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download