ZICHAO LI ZICHAO LI - 2 months ago 11
Python Question

How to groupby and assign an array to a column in python-pandas?

Given a data frame

df
like that:

a b
2 nan
3 nan
3 nan
4 nan
4 nan
4 nan
5 nan
5 nan
5 nan
5 nan
...


A critical rule is that each number
n
in
a
repeat
n-1
rows. And my expected output is:

a b
2 1
3 1
3 2
4 1
4 2
4 3
5 1
5 2
5 3
5 4
...


Thus the number
m
in
b
is a list from
1
to
n-1
. I tried it in this way:

df.groupby('a').apply(lambda x: np.asarray(range(x['a'].unique()[0])))


But the result is a list in one row, which is not what I want.

Could you please tell me how to implement it? Thanks in advance!

Answer

You need cumcount:

df['b'] = df.groupby('a').cumcount() + 1
print (df)
   a  b
0  2  1
1  3  1
2  3  2
3  4  1
4  4  2
5  4  3
6  5  1
7  5  2
8  5  3
9  5  4