piRSquared piRSquared - 1 month ago 5
Python Question

calculate useful order statistics using describe with NaN in dataframe

consider the

df


import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(np.random.randn(100, 10), columns=list('ABCDEFGHIJ'))





describe


calculates useful statistics

df.describe()


enter image description here




introduce
NaN


now consider
d1


d1 = df.mask(np.random.choice([True, False], df.shape, p=[.2, .8]))
d1.describe()


enter image description here




I have not calculations for
['25%', '50%', '75%']


How do I get these conveniently using pre-existing functions?

Answer

A much cleaner way would be to use the include argument such has:

d1.describe(include=['float64'])

Out[214]: 
            A       B       C       D       E       F       G       H       I       J
count 70.0000 77.0000 81.0000 82.0000 78.0000 81.0000 80.0000 82.0000 75.0000 81.0000
mean   0.0572 -0.1383 -0.1550 -0.0658  0.0074 -0.0508 -0.0253 -0.0202 -0.1054  0.1019
std    0.9580  0.9447  1.0263  0.9393  0.8976  0.9207  0.9993  0.9474  1.0305  0.7382
min   -2.3045 -2.3190 -2.2027 -2.8470 -2.7149 -2.4345 -2.3619 -2.0283 -2.1609 -1.6739
25%   -0.5287 -0.6854 -0.9155 -0.8202 -0.5456 -0.6045 -0.6823 -0.6192 -0.9222 -0.3186
50%    0.0581 -0.2999 -0.1799 -0.0525  0.0181 -0.1502 -0.1421 -0.0458 -0.0108  0.1053
75%    0.5510  0.4997  0.5064  0.7505  0.5904  0.5217  0.6515  0.5790  0.6261  0.7041
max    2.6967  2.3198  2.5974  1.8385  2.2225  2.6081  2.4215  2.0045  2.1077  1.9469

you could also use the exclude argument but it's tricky with NaN values. passing 'bool' works

d1.describe(exclude=['bool'])
Out[221]: 
            A       B       C       D       E       F       G       H       I       J
count 70.0000 77.0000 81.0000 82.0000 78.0000 81.0000 80.0000 82.0000 75.0000 81.0000
mean   0.0572 -0.1383 -0.1550 -0.0658  0.0074 -0.0508 -0.0253 -0.0202 -0.1054  0.1019
std    0.9580  0.9447  1.0263  0.9393  0.8976  0.9207  0.9993  0.9474  1.0305  0.7382
min   -2.3045 -2.3190 -2.2027 -2.8470 -2.7149 -2.4345 -2.3619 -2.0283 -2.1609 -1.6739
25%   -0.5287 -0.6854 -0.9155 -0.8202 -0.5456 -0.6045 -0.6823 -0.6192 -0.9222 -0.3186
50%    0.0581 -0.2999 -0.1799 -0.0525  0.0181 -0.1502 -0.1421 -0.0458 -0.0108  0.1053
75%    0.5510  0.4997  0.5064  0.7505  0.5904  0.5217  0.6515  0.5790  0.6261  0.7041
max    2.6967  2.3198  2.5974  1.8385  2.2225  2.6081  2.4215  2.0045  2.1077  1.9469