piRSquared - 1 year ago 88
Python Question

# calculate useful order statistics using describe with NaN in dataframe

consider the

`df`

``````import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(np.random.randn(100, 10), columns=list('ABCDEFGHIJ'))
``````

`describe`

calculates useful statistics

``````df.describe()
``````

introduce
`NaN`

now consider
`d1`

``````d1 = df.mask(np.random.choice([True, False], df.shape, p=[.2, .8]))
d1.describe()
``````

I have not calculations for
`['25%', '50%', '75%']`

How do I get these conveniently using pre-existing functions?

A much cleaner way would be to use the include argument such has:

``````d1.describe(include=['float64'])

Out[214]:
A       B       C       D       E       F       G       H       I       J
count 70.0000 77.0000 81.0000 82.0000 78.0000 81.0000 80.0000 82.0000 75.0000 81.0000
mean   0.0572 -0.1383 -0.1550 -0.0658  0.0074 -0.0508 -0.0253 -0.0202 -0.1054  0.1019
std    0.9580  0.9447  1.0263  0.9393  0.8976  0.9207  0.9993  0.9474  1.0305  0.7382
min   -2.3045 -2.3190 -2.2027 -2.8470 -2.7149 -2.4345 -2.3619 -2.0283 -2.1609 -1.6739
25%   -0.5287 -0.6854 -0.9155 -0.8202 -0.5456 -0.6045 -0.6823 -0.6192 -0.9222 -0.3186
50%    0.0581 -0.2999 -0.1799 -0.0525  0.0181 -0.1502 -0.1421 -0.0458 -0.0108  0.1053
75%    0.5510  0.4997  0.5064  0.7505  0.5904  0.5217  0.6515  0.5790  0.6261  0.7041
max    2.6967  2.3198  2.5974  1.8385  2.2225  2.6081  2.4215  2.0045  2.1077  1.9469
``````

you could also use the `exclude` argument but it's tricky with NaN values. passing 'bool' works

``````d1.describe(exclude=['bool'])
Out[221]:
A       B       C       D       E       F       G       H       I       J
count 70.0000 77.0000 81.0000 82.0000 78.0000 81.0000 80.0000 82.0000 75.0000 81.0000
mean   0.0572 -0.1383 -0.1550 -0.0658  0.0074 -0.0508 -0.0253 -0.0202 -0.1054  0.1019
std    0.9580  0.9447  1.0263  0.9393  0.8976  0.9207  0.9993  0.9474  1.0305  0.7382
min   -2.3045 -2.3190 -2.2027 -2.8470 -2.7149 -2.4345 -2.3619 -2.0283 -2.1609 -1.6739
25%   -0.5287 -0.6854 -0.9155 -0.8202 -0.5456 -0.6045 -0.6823 -0.6192 -0.9222 -0.3186
50%    0.0581 -0.2999 -0.1799 -0.0525  0.0181 -0.1502 -0.1421 -0.0458 -0.0108  0.1053
75%    0.5510  0.4997  0.5064  0.7505  0.5904  0.5217  0.6515  0.5790  0.6261  0.7041
max    2.6967  2.3198  2.5974  1.8385  2.2225  2.6081  2.4215  2.0045  2.1077  1.9469
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download