eleanor eleanor - 1 year ago 60
Python Question

Create columns of lists from DataFrameGroupBy object

I would like to create a groupby dataframe and group row in multiple lists :
I have the following dataframe :

list_date = [1,1,2,2,2,3,3,3,4,5]
list_request = [2,2,2,3,3,2,3,3,3,3]
list_users = [1,3,7,1,7,3,4,9,7,9]
list_count = [1,1,2,3,1,3,1,2,1,1]
df = pd.DataFrame({'date':list_date,'request':list_request,'users':list_users, 'count':list_count})

count date request users
0 1 1 2 1
1 1 1 2 3
2 2 2 2 7
3 3 2 3 1
4 1 2 3 7
5 3 3 2 3
6 1 3 3 4
7 2 3 3 9
8 1 4 3 7
9 1 5 3 9


and I want to have the following result (grouping by ['date','request']):

count date request users
0 [1, 1] 1 2 [1, 3]
1 [2] 2 2 [7]
2 [3, 1] 2 3 [1, 7]
3 [3] 3 2 [3]
4 [1, 2] 3 3 [4, 9]
5 [1] 4 3 [7]
6 [1] 5 3 [9]


If I use the solution here : grouping rows in list in pandas groupby
I can create 1 list but how can I create 2 lists at the same time ?
Something like :

df = df.groupby(['date','request'])['users','count'].apply(list).reset_index()


And I need something efficient as in reality I have millions of rows...

Thank you!

Answer Source

You can apply pd.Series and then rename your columns:

In [299]: df.groupby(['date', 'request']).apply(lambda x: pd.Series((x['count'].tolist(), x['users'].tolist()))).reset_index().rename(columns={0 : 'count', 1 : 'users'})
Out[299]: 
   date  request   count   users
0     1        2  [1, 1]  [1, 3]
1     2        2     [2]     [7]
2     2        3  [3, 1]  [1, 7]
3     3        2     [3]     [3]
4     3        3  [1, 2]  [4, 9]
5     4        3     [1]     [7]
6     5        3     [1]     [9]

Another solution is applying list with df.apply:

In [364]: grouped_df = df.groupby(['date', 'request'])

In [365]: pd.DataFrame({ 'count' : grouped_df['count'].apply(list), 'users' : grouped_df['users'].apply(list) }).reset_index() #, columns=['count', 'users'])
Out[365]: 
   date  request   count   users
0     1        2  [1, 1]  [1, 3]
1     2        2     [2]     [7]
2     2        3  [3, 1]  [1, 7]
3     3        2     [3]     [3]
4     3        3  [1, 2]  [4, 9]
5     4        3     [1]     [7]
6     5        3     [1]     [9]

Better solutions may be seen here (I asked a question): Converting each grouped column in DataFrameGroupBy object to a list

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download