feedthemachine feedthemachine - 5 months ago 21
Python Question

merging 3 dataframes with different indexes

I have 3 DFs: GDP, energy, ScimEn.

print(GDP.index.name) > Country
print(energy.index.name) > None
print(ScimEn.index.name) > None


Although energy and ScimEn do have a 'Country' column.
I would like to merge all DFs on 'Country'. How can I do that? I have tried to do the following

newdf = (pd.merge(energy, ScimEn, how='inner', on='Country').merge(GDP, how='inner', on=GDP.index.name))
> KeyError: 'Country'


If I try:

newdf = (pd.merge(energy, ScimEn, how='inner', on='Country').
merge(GDP, how='inner', left_index=True))
raise MergeError('Must pass right_on or right_index=True')
pandas.tools.merge.MergeError: Must pass right_on or right_index=True


If I try:

newdf = (pd.merge(energy, ScimEn, how='inner', on='Country').
merge(GDP, how='inner', left_index=True, right_index=True))


It returns:

Empty DataFrame
Columns: [Country, Energy Supply, Energy Supply per Capita, % Renewable, Rank, Documents, Citable documents, Citations, Self-citations, Citations per document, H index, Country Code, Indicator Name, Indicator Code, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]
Index: []

Answer Source

You can add reset_index to GDP:

newdf = pd.merge(energy, ScimEn, on='Country').merge(GDP.reset_index(), on='Country')

Alternative if many DataFrames:

from  functools import reduce
dfs = [energy, ScimEn, GDP.reset_index()]
newdf = reduce(lambda left,right: pd.merge(left,right,on='Country'), dfs)

Solution without reset_index with join:

newdf = pd.merge(energy, ScimEn, on='Country').join(GDP, on='Country', how='inner')