 KSQ - 2 years ago 190
Python Question

# (Python, DataFrame): Record the average of all numbers in a column that are smaller than the n'th percentile

I have a DataFrame similar to the below and would like to create a DataFrame or series that looks more like the second table.

For example: I would find the nth percentile of column A, then take the average of all numbers in A that are less than the nth percentile.

I've used the code below to get the average and range of each column but seem to be missing something to get the conditional average.

min = df.min(axis='index')

max = df.max(axis='index')

mean = df.mean(axis = 'index')

df[df < np.percentile(df, 0.4)].mean() #this doesnt seem to work and I beliebe
gives the average of every row

Table 1

``````Date    A   B   C   D   E   F
02/10/2017  10  5   1   2   1   1
01/10/2017  10  4   9   4   3   5
30/09/2017  4   8   5   6   2   4
29/09/2017  8   2   7   9   10  5
28/09/2017  3   8   2   7   10  8
27/09/2017  7   3   8   9   9   7
26/09/2017  4   1   2   9   3   4
25/09/2017  10  1   6   6   3   5
24/09/2017  8   3   5   5   6   7
23/09/2017  7   9   5   7   1   3
22/09/2017  2   9   10  5   8   1
``````

Table 2

``````Index   Avg<40th Percentile
A   3.25
B   1.333333333
C   1.666666667
D   4
E   1.333333333
F   1.666666667
`````` djk47463

Use

``````df.where(df < df.quantile(0.4)).mean()

Date         NaN
A       3.250000
B       1.333333
C       1.666667
D       4.000000
E       1.333333
F       1.666667
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download