user1017373 - 1 month ago 5x
Python Question

# Calculating mean for sub-set of dataframe based on unique row names

I have a dataframe which looks like follows,

`````` df.head()

Sym P1  P2  P3  P4  P5  B1  B2  B3  B4  B5
AA  7.86    8.86    9.86    10.86   11.86   0.7768  1.7768  2.7768  3.7768  4.7768
AA  7.86    8.86    9.86    10.86   11.86   0.8664  1.8664  2.8664  3.8664  4.8664
AA  7.86    8.86    9.86    10.86   11.86   0.874534    1.874534    2.874534    3.874534    4.874534
BB  5.8 6.8 7.8 8.8 9.8 7.42    8.42    9.42    10.42   11.42
BB  5.8 6.8 7.8 8.8 9.8 0.1434  1.1434  2.1434  3.1434  4.1434
CC  0.421   1.421   2.421   3.421   4.421   6.78    7.78    8.78    9.78    10.78
CC  0.421   1.421   2.421   3.421   4.421   8.43    9.43    10.43   11.43   12.43
VV  3.25    4.25    5.25    6.25    7.25    0.97    1.97    2.97    3.97    4.97
VV  3.25    4.25    5.25    6.25    7.25    0.2 1.2 2.2 3.2 4.2
VV  3.25    4.25    5.25    6.25    7.25    0.45    1.45    2.45    3.45    4.45
VV  3.25    4.25    5.25    6.25    7.25    0.78    1.78    2.78    3.78    4.78
``````

And what I am aiming is to get the mean of the second half(Columns Starting with name B1..B5) of the data frame based on the unique values in column 'sym' and make a new dataframe which looks as follows.

``````Sym P1  P2  P3  P4  P5  B1  B2  B3  B4  B5
AA  7.86    8.86    9.86    10.86   11.86   0.8664  1.8664  2.8664  3.8664  4.8664
BB  5.8 6.8 7.8 8.8 9.8 3.7817  4.7817  5.7817  6.7817  7.7817
CC  0.421   1.421   2.421   3.421   4.421   7.605   8.605   9.605   10.605  11.605
VV  3.25    4.25    5.25    6.25    7.25    0.615   1.615   2.615   3.615   4.615
``````

I tried to used groupby for that to get the unique sym .Would be great if someone could suggest a simple way to proceed
Thank you

Answer

Use `filter` and `groupby`

``````transformed = df.filter(like='B').groupby(df.Sym).transform(np.mean)
df.loc[:, df.columns.str.contains('B')] = transformed
df
``````

Comments