Stanko - 1 year ago 31

Python Question

Say I have following dataframe (but keep in mind this could have 100+ rows and columns):

I only want to sum the values of some rows that met a condition, in this case of the rows that have a **2** for **stream**. For the other rows I want them to get a default value, for example **0**.

This is what I tried:

`cols = [col for col in dataFrame.columns if col != 'stream']`

dataFrame.loc[dataFrame['stream'] == 2, cols].sum(axis=1)

But it doesn't get the result I want. What's wrong with my code?

Answer

I think you are very close, you need only add new column `sum`

and then `fillna`

with `0`

:

```
cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']
df1['sum'] = df1.loc[df1['stream'] == 2, cols ].sum(axis=1)
df1['sum'] = df1['sum'].fillna(0)
print df1
stream feat another_feat sum
a 1 8 4 0.0
b 2 5 5 10.0
c 2 7 7 14.0
d 3 3 2 0.0
```

If all values are `int`

, last you can cast `float`

to `int`

by `astype`

:

```
df1['sum'] = df1['sum'].fillna(0).astype(int)
print df1
stream feat another_feat sum
a 1 8 4 0
b 2 5 5 10
c 2 7 7 14
d 3 3 2 0
```

Another solution with `numpy.where`

:

```
df1['sum'] = np.where(df1['stream'] == 2, df1[cols].sum(axis=1), 0)
print df1
stream feat another_feat sum
a 1 8 4 0
b 2 5 5 10
c 2 7 7 14
d 3 3 2 0
```