farhawa - 2 months ago 5
Python Question

# How pandas.DataFrame.groupby actually work

I need to group a

`pandas.DataFrame`
by one,two and three columns and compute the mean of the "groups".

Something like:

``````    col1  col2  col3  col4
0   A     17    R     3
1   B     5     T     7
2   F     25    R     11
3   A     33    R     15
4   B     17    T     19
5   F     25    R     23
6   F     25    E     27
``````

Group by one columns: col1

Here I want the result to be (col3 is dropped as it's not numeric):

``````                  col2         col4
col1 = A | 0  (17+33)/2    (3+15)/2
col1 = B | 1  (5+17)/2     (7+19)/2
col1 = F | 2  (25+25+25)/2    (11+23)+27/2
``````

Group by one columns: col1 & col3

``````                             col2         col4
col1 = A & col3 = R | 0  (17+33)/2    (3+15)/2
col1 = B & col3 = T | 1  (5+17)/2     (7+19)/2
col1 = F & col3 = R | 2  (25+25)/2    (11+23)/2
col1 = F & col3 = E | 4  25          27
``````

And the same thing for group by 3 columnms.

I found the
`pandas.DataFrame.groupby().mean()`
method but I can't figure out how it works exactly.

For example, for this simple dataframe:

``````In [1]: df
Out[2]:
v1  v2  v3  v4
0   0  17   2   3
1   4   5   6   7
2   8  25  10  11
3  12  33  14  15
4  16  17  18  19
5  20  25  22  23
6  24  25  26  27
7  28  29  30  31
8  32   5  34  35
9  36   5  38  39
In [2]: df.groupby(["v2"]).mean()
Out[2]:
v1         v3         v4
v2
5   24.000000  26.000000  27.000000
17   8.000000  10.000000  11.000000
25  17.333333  19.333333  20.333333
29  28.000000  30.000000  31.000000
33  12.000000  14.000000  15.000000

## For this first case it's ok...

In [3]: df.groupby(["v2","v3"]).mean()
Out[3]:
v1  v4
v2 v3
5  6    4   7
34  32  35
38  36  39
17 2    0   3
18  16  19
25 10   8  11
22  20  23
26  24  27
29 30  28  31
33 14  12  15
``````

How exactly did the
`groupby`
function work and why this result (out[3]) don't have the same length as the original dataframe (as there is no commun couple (v2, v3) in the dataframe) ???

For your first 2 examples at the top, here is the syntax you are looking for:

``````>>>df.groupby(['col1'])['col2', 'col4'].mean()

col2       col4
col1
A       25   9.000000
B       11  13.000000
F       25  20.333333

>>>df.groupby(['col1','col3'])['col2', 'col4'].mean()

col2  col4
col1 col3
A    R       25     9
B    T       11    13
F    E       25    27
R       25    17
``````

Does that help you get the group that you are looking for?

Source (Stackoverflow)