Stanko Stanko - 7 months ago 15
Python Question

Sum over rows (but not all), give other rows default value

Say I have following dataframe (but keep in mind this could have 100+ rows and columns):

table

I only want to sum the values of some rows that met a condition, in this case of the rows that have a 2 for stream. For the other rows I want them to get a default value, for example 0.

This is what I tried:

cols = [col for col in dataFrame.columns if col != 'stream']
dataFrame.loc[dataFrame['stream'] == 2, cols].sum(axis=1)


But it doesn't get the result I want. What's wrong with my code?

Answer

I think you are very close, you need only add new column sum and then fillna with 0:

cols = [col for col in df1.columns if col != 'stream']
print cols
['feat', 'another_feat']

df1['sum'] = df1.loc[df1['stream'] == 2, cols ].sum(axis=1)
df1['sum'] = df1['sum'].fillna(0)
print df1
   stream  feat  another_feat   sum
a       1     8             4   0.0
b       2     5             5  10.0
c       2     7             7  14.0
d       3     3             2   0.0

If all values are int, last you can cast float to int by astype:

df1['sum'] = df1['sum'].fillna(0).astype(int)
print df1
   stream  feat  another_feat  sum
a       1     8             4    0
b       2     5             5   10
c       2     7             7   14
d       3     3             2    0

Another solution with numpy.where:

df1['sum'] = np.where(df1['stream'] == 2, df1[cols].sum(axis=1), 0)
print df1
   stream  feat  another_feat  sum
a       1     8             4    0
b       2     5             5   10
c       2     7             7   14
d       3     3             2    0