ritos ritos - 3 years ago 135
Python Question

How to seperate sequential groups in pandas?

The question might be odd, so here's an example.

Suppose my data looks like below:

a 2
a 1
a 1
b 3
b 1
a 4
c 3
c 6
c 7

As you see there are 3 elements a, b, c and we can just groupby & aggregate like:

a 8
b 4
c 16

but what I really want to do is, I want to "SEPERATE" "within" groups.
So the desired result is:

a 4
b 4
a 4
c 16

this means that the first sequential 'a' group and the second 'a' that appears later is treated different. Let me show you simply by a list.


The point is we have to consider the sequence. It's not just using pandas cumcount, which only considers the count of appearance. I want to divide each 'sequences', even within same element group.(first aaa and second aaa is different)

I don't think this is that simple in pandas, though I'm avoiding to do this job by just making explicit function-based logic. In short, I want to do this much more simply. Is there any idea?

Answer Source

You can use the shift-cumsum pattern.

>>> df.groupby((df['A'] != df['A'].shift()).cumsum(), as_index=False).agg(
        {'A': 'first', 'B': 'sum'})
# Output:
#        A   B
#     0  a   4
#     1  b   4
#     2  a   4
#     3  c  16
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download