Alex - 1 year ago 72

Python Question

I have several pandas dataframes, each with one column of ints in them, and I would like to create a new dataframe with the sum of their values at each index. Their indexes will have some overlapping entries and these are the indecies whose values I want to add together. If an index is found in only one dataframe I want the new dataframe (or series) to include that index and just use that one value as its value. This seems straight-forward but I can't figure it out and the documentation seems to focus on joining dataframes more so than combining their values.

Basically, given two dataframes that look like this:

`>>> df1`

0

a 3

b 7

d 2

>>> df2

0

c 11

d 19

And I'd like to have the final output look like this:

`>>> df3`

0

a 3

b 7

c 11

d 21

Thanks in advance.

Answer Source

Simplest answer, if you're only adding two dataframes:

```
# fill_value parameter specifies how to treat missing rows, since you can't add NaN (i.e. add 0)
df3 = df1.add(df2, fill_value=0)
df3
Out[18]:
0
a 3
b 7
c 13
d 19
```

However, if you want to add more than two, the easiest and fastest way is more like this:

```
import pandas as pd
# initialize example inputs
df1 = pd.DataFrame([3, 7, 2], index=['a', 'b', 'c'])
df2 = pd.DataFrame([11, 19], index=['c', 'd'])
df3 = pd.DataFrame([3, 7, 11, 21], index=['a', 'b', 'c', 'd'])
# when concatenating with axis=1, columns are added side by side. Rows are matched with other rows having the same index.
aggregate_df = pd.concat([df1, df2, df3], axis=1)
# sum across columns (axis=1). Convert resulting Series to DataFrame
df4 = aggregate_df.sum(axis=1).to_frame()
df4
Out[11]:
0
a 6
b 14
c 24
d 40
dtype: float64
```