Levine Levine - 1 month ago 7
Python Question

Add Calculated Column into DF and Plot 2 Lines

I start with the dataframe below, where each row is a new trial:

test_group range success
0 test 1-5 1
1 test 1-5 0
2 test 1-5 1
3 test 6-10 1
4 test 6-10 0
5 test 6-10 0
6 control 1-5 0
7 control 1-5 0
8 control 1-5 1
9 control 6-10 1
10 control 6-10 1
11 control 6-10 1


I want to compute the mean success-value and group by test-group and range.

To do so, I'd write the following code:

df = df.groupby('test_group','range').success.mean()


My result looks like the following

test_group range
test 1-5 0.66
6-10 0.33
control 1-5 0.33
6-10 1.00


Ideally, I want my final output to look like the following so that I can plot both test groups on the same chart, with the x-axis being each range and the y-axis being the success-rate:

test_group range success-rate
0 test 1-5 0.66
1 test 1-5 0.66
2 test 1-5 0.66
3 test 6-10 0.33
4 test 6-10 0.33
5 test 6-10 0.33
6 control 1-5 0.33
7 control 1-5 0.33
8 control 1-5 0.33
9 control 6-10 1.00
10 control 6-10 1.00
11 control 6-10 1.00

Answer

you can use transform() method:

In [35]: df['success-rate'] = df.groupby(['test_group','range'])['success'].transform('mean')

In [36]: df
Out[36]:
   test_group range  success  success-rate
0        test   1-5        1      0.666667
1        test   1-5        0      0.666667
2        test   1-5        1      0.666667
3        test  6-10        1      0.333333
4        test  6-10        0      0.333333
5        test  6-10        0      0.333333
6     control   1-5        0      0.333333
7     control   1-5        0      0.333333
8     control   1-5        1      0.333333
9     control  6-10        1      1.000000
10    control  6-10        1      1.000000
11    control  6-10        1      1.000000

Groupby.transform() method applies an aggregating function to all original rows

Comments