KSQ KSQ - 1 year ago 146
Python Question

(Python, DataFrame): Record the average of all numbers in a column that are smaller than the n'th percentile

I have a DataFrame similar to the below and would like to create a DataFrame or series that looks more like the second table.

For example: I would find the nth percentile of column A, then take the average of all numbers in A that are less than the nth percentile.

I've used the code below to get the average and range of each column but seem to be missing something to get the conditional average.

min = df.min(axis='index')

max = df.max(axis='index')

mean = df.mean(axis = 'index')

df[df < np.percentile(df, 0.4)].mean() #this doesnt seem to work and I beliebe
gives the average of every row

Table 1

Date A B C D E F
02/10/2017 10 5 1 2 1 1
01/10/2017 10 4 9 4 3 5
30/09/2017 4 8 5 6 2 4
29/09/2017 8 2 7 9 10 5
28/09/2017 3 8 2 7 10 8
27/09/2017 7 3 8 9 9 7
26/09/2017 4 1 2 9 3 4
25/09/2017 10 1 6 6 3 5
24/09/2017 8 3 5 5 6 7
23/09/2017 7 9 5 7 1 3
22/09/2017 2 9 10 5 8 1


Table 2

Index Avg<40th Percentile
A 3.25
B 1.333333333
C 1.666666667
D 4
E 1.333333333
F 1.666666667

Answer Source

Use

df.where(df < df.quantile(0.4)).mean()

Date         NaN
A       3.250000
B       1.333333
C       1.666667
D       4.000000
E       1.333333
F       1.666667
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download