farhawa farhawa - 4 months ago 7
Python Question

How pandas.DataFrame.groupby actually work

I need to group a

pandas.DataFrame
by one,two and three columns and compute the mean of the "groups".

Something like:

col1 col2 col3 col4
0 A 17 R 3
1 B 5 T 7
2 F 25 R 11
3 A 33 R 15
4 B 17 T 19
5 F 25 R 23
6 F 25 E 27


Group by one columns: col1

Here I want the result to be (col3 is dropped as it's not numeric):

col2 col4
col1 = A | 0 (17+33)/2 (3+15)/2
col1 = B | 1 (5+17)/2 (7+19)/2
col1 = F | 2 (25+25+25)/2 (11+23)+27/2


Group by one columns: col1 & col3

col2 col4
col1 = A & col3 = R | 0 (17+33)/2 (3+15)/2
col1 = B & col3 = T | 1 (5+17)/2 (7+19)/2
col1 = F & col3 = R | 2 (25+25)/2 (11+23)/2
col1 = F & col3 = E | 4 25 27


And the same thing for group by 3 columnms.

I found the
pandas.DataFrame.groupby().mean()
method but I can't figure out how it works exactly.

For example, for this simple dataframe:

In [1]: df
Out[2]:
v1 v2 v3 v4
0 0 17 2 3
1 4 5 6 7
2 8 25 10 11
3 12 33 14 15
4 16 17 18 19
5 20 25 22 23
6 24 25 26 27
7 28 29 30 31
8 32 5 34 35
9 36 5 38 39
In [2]: df.groupby(["v2"]).mean()
Out[2]:
v1 v3 v4
v2
5 24.000000 26.000000 27.000000
17 8.000000 10.000000 11.000000
25 17.333333 19.333333 20.333333
29 28.000000 30.000000 31.000000
33 12.000000 14.000000 15.000000

## For this first case it's ok...

In [3]: df.groupby(["v2","v3"]).mean()
Out[3]:
v1 v4
v2 v3
5 6 4 7
34 32 35
38 36 39
17 2 0 3
18 16 19
25 10 8 11
22 20 23
26 24 27
29 30 28 31
33 14 12 15


How exactly did the
groupby
function work and why this result (out[3]) don't have the same length as the original dataframe (as there is no commun couple (v2, v3) in the dataframe) ???

Answer

For your first 2 examples at the top, here is the syntax you are looking for:

>>>df.groupby(['col1'])['col2', 'col4'].mean()

      col2       col4
col1                 
A       25   9.000000
B       11  13.000000
F       25  20.333333

>>>df.groupby(['col1','col3'])['col2', 'col4'].mean()

           col2  col4
col1 col3            
A    R       25     9
B    T       11    13
F    E       25    27
     R       25    17

Does that help you get the group that you are looking for?