ADS - 6 months ago 43

Python Question

In pandas, I have been looking for a general flow to group a dataframe by a certain column, perform non-trivial operations on the groups, and then reconstitute the groups again back into a big dataframe (by effectively stacking them on top of each other).

Imagine I have a DataFrame

`df`

`+----+-------+---+---+---+`

| | A | B | C | D |

+----+-------+---+---+---+

| 0 | Green | 1 | 4 | 5 |

| 1 | Red | 2 | 3 | 2 |

| 2 | Red | 1 | 4 | 3 |

| 3 | Green | 2 | 2 | 2 |

| 4 | Green | 1 | 1 | 1 |

| 5 | Blue | 2 | 1 | 5 |

| 6 | Red | 2 | 1 | 6 |

| 7 | Blue | 7 | 8 | 9 |

| 8 | Green | 7 | 6 | 5 |

| 9 | Red | 0 | 9 | 0 |

| 10 | Blue | 4 | 5 | 4 |

+----+-------+---+---+---+

I would like to groupby() column A and then perform an operation on each group. Typically this operation involves creating new rows by comparing the value in one row with the value in the row, for all rows, so I wouldn't say it could be done with a lambda function applied to the groups. Then, I want to put these groups back together into dataframe, effectively in the same format as above but with the inserted rows.

My general approach so far has been to do it the "slow and stupid" way, i.e:

`group_list = []`

g = df.groupby("A")

for i, group in g:

###Perform some weird operation on group that can't really be reduced to a

#lambda function applied to each group.

group_list.append(group)

reconstituted = group_list[0]

for i in range(1,len(group_list)):

reconstituted = reconstituted.append(group_list[i], ignore_index=True)

Clearly this isn't particularly pandas-esque, so that is my question - what is a better way of operating on the groups themselves and then reconstituting them?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Without knowing about what your function does, if all you want to do is just join them back, you can use `pd.concat`

:

```
df_new = pd.concat(group_list)
```

MVCE:

```
In [77]: df1
Out[77]:
0
0 a
1 b
In [78]: df2
Out[78]:
0
0 c
1 d
In [79]: pd.concat([df1, df2], ignore_index=True)
Out[79]:
0
0 a
1 b
0 c
1 d
```

However, I would urge you to consider a different technique which doesn't involve explicitly splitting the groups and working on them separately, that's very inefficient.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**