Pauline - 1 year ago 324
Python Question

Sum of several columns from a pandas dataframe

So say I have the following table:

``````In [2]: df = pd.DataFrame({'a': [1,2,3], 'b':[2,4,6], 'c':[1,1,1]})

In [3]: df
Out[3]:
a  b  c
0  1  2  1
1  2  4  1
2  3  6  1
``````

I can sum a and b that way:

``````In [4]: sum(df['a']) + sum(df['b'])
Out[4]: 18
``````

However this is not very convenient for larger dataframe, where you have to sum multiple columns together.

Is there a neater way to sum columns (similar to the below)? What if I want to sum the entire DataFrame without specifying the columns?

``````In [4]: sum(df[['a', 'b']]) #that will not work!
Out[4]: 18
In [4]: sum(df) #that will not work!
Out[4]: 21
``````

I think you can use double `sum` - first `DataFrame.sum` create `Series` of sums and second `Series.sum` get sum of `Series`:

``````print (df[['a','b']].sum())
a     6
b    12
dtype: int64

print (df[['a','b']].sum().sum())
18
``````

You can also use:

``````print (df[['a','b']].sum(axis=1))
0    3
1    6
2    9
dtype: int64

print (df[['a','b']].sum(axis=1).sum())
18
``````

Thank you pirSquared for another solution - convert `df` to `numpy array` by `values` and then `sum`:

``````print (df[['a','b']].values.sum())
18
``````

``````print (df.sum().sum())
21
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download