Pauline - 8 months ago 47

Python Question

So say I have the following table:

`In [2]: df = pd.DataFrame({'a': [1,2,3], 'b':[2,4,6], 'c':[1,1,1]})`

In [3]: df

Out[3]:

a b c

0 1 2 1

1 2 4 1

2 3 6 1

I can sum a and b that way:

`In [4]: sum(df['a']) + sum(df['b'])`

Out[4]: 18

However this is not very convenient for larger dataframe, where you have to sum multiple columns together.

Is there a neater way to sum columns (similar to the below)? What if I want to sum the entire DataFrame without specifying the columns?

`In [4]: sum(df[['a', 'b']]) #that will not work!`

Out[4]: 18

In [4]: sum(df) #that will not work!

Out[4]: 21

Answer

I think you can use double `sum`

- first `DataFrame.sum`

create `Series`

of sums and second `Series.sum`

get sum of `Series`

:

```
print (df[['a','b']].sum())
a 6
b 12
dtype: int64
print (df[['a','b']].sum().sum())
18
```

You can also use:

```
print (df[['a','b']].sum(axis=1))
0 3
1 6
2 9
dtype: int64
print (df[['a','b']].sum(axis=1).sum())
18
```

Thank you pirSquared for another solution - convert `df`

to `numpy array`

by `values`

and then `sum`

:

```
print (df[['a','b']].values.sum())
18
```

```
print (df.sum().sum())
21
```

Source (Stackoverflow)