Yakym Pirozhenko Yakym Pirozhenko - 1 month ago 8
Python Question

Idiomatic clip on quantile for DataFrame

I am trying to clip outliers in the DataFrame based on quantiles for each column. Let's say

df = pd.DataFrame(pd.np.random.randn(10,2))


0 1
0 0.734355 0.594992
1 -0.745949 0.597601
2 0.295606 0.972196
3 0.474539 1.462364
4 0.238838 0.684790
5 -0.659094 0.451718
6 0.675360 -1.286660
7 0.713914 0.135179
8 -0.435309 -0.344975
9 1.200617 -0.392945


I currently use

df_clipped = df.apply(lambda col: col.clip(*col.quantile([0.05,0.95]).values))


0 1
0 0.734355 0.594992
1 -0.706865 0.597601
2 0.295606 0.972196
3 0.474539 1.241788
4 0.238838 0.684790
5 -0.659094 0.451718
6 0.675360 -0.884488
7 0.713914 0.135179
8 -0.435309 -0.344975
9 0.990799 -0.392945


This works but I am wondering if there is a more elegant pandas/numpy based approach.

Answer Source

You can use clip and align on the first axis:

df.clip(df.quantile(0.05), df.quantile(0.95), axis=1)
Out: 
          0         1
0  0.734355  0.594992
1 -0.706864  0.597601
2  0.295606  0.972196
3  0.474539  1.241788
4  0.238838  0.684790
5 -0.659094  0.451718
6  0.675360 -0.884488
7  0.713914  0.135179
8 -0.435309 -0.344975
9  0.990799 -0.392945