nguyenistheloneliestnumber nguyenistheloneliestnumber -4 years ago 103
Python Question

Conditionally setting rows in pandas groupby

I have a (simplified) dataframe like:

+--------+-----------+-------+
| type | estimated | value |
+--------+-----------+-------+
| type_a | TRUE | 1 |
| type_a | TRUE | 2 |
| type_a | | 3 |
| type_b | | 4 |
| type_b | | 5 |
| type_b | | 6 |
+--------+-----------+-------+


I'd like to group and sum it into two rows:

+--------+-----------+-------+
| type | estimated | value |
+--------+-----------+-------+
| type_a | TRUE | 6 |
| type_b | | 15 |
+--------+-----------+-------+


However, I want the grouped row to have the 'estimated' column to be TRUE if any of the rows grouped to form it were estimated. If my group by includes the 'estimated' column, then the rows won't be grouped together.

My idea was to iterate through each group, e.g. (pseudocode)

grouped = df.groupby('type')
for group in grouped:
group['flag'] = 0
for row in group:
if row['estimated'] == True:
group['flag'] = 1


Then after grouping I could set all the rows with non-zero 'flag' to an estimated = True.

I'm having some trouble figuring out how to iterate through rows of groups, and the solution seems pretty hacky. Also you shouldn't edit something you're iterating over. Is there a solution/better way?

Answer Source

you want groupby with agg

df.groupby('type').agg(dict(estimated='any', value='sum')).reset_index()

     type  value estimated
0  type_a      6      True
1  type_b     15     False
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download