user113531 user113531 - 4 months ago 34
Python Question

Pandas sum multiple dataframes

I have multiple dataframes each with a multi-level-index and a value column. I want to add up all the dataframes on the value columns.

df1 + df2


Not all the indexes are complete in each dataframe, hence I am getting
nan
on a row which is not present in all the dataframes.

How can I overcome this and treat rows which are not present in any dataframe as having a value of 0?

Eg. I want to get

val
a 2
b 4
c 3
d 3


from
pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}}) + pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}})
instead of

val
a 2
b 4
c NaN
d NaN

Answer

use the add method with fill_value=0 parameter.

df1 = pd.DataFrame({'val':{'a': 1, 'b':2, 'c':3}})
df2 = pd.DataFrame({'val':{'a': 1, 'b':2, 'd':3}})

df1.add(df2, fill_value=0)

enter image description here


MultiIndex example

idx1 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'B'), ('b', 'A'), ('b', 'D')])
idx2 = pd.MultiIndex.from_tuples([('a', 'A'), ('a', 'C'), ('b', 'A'), ('b', 'C')])

np.random.seed([3,1415])
df1 = pd.DataFrame(np.random.randn(4, 1), idx1, ['val'])
df2 = pd.DataFrame(np.random.randn(4, 1), idx2, ['val'])

df1

enter image description here

df2

enter image description here

df1.add(df2, fill_value=0)

enter image description here