mvsrs mvsrs - 6 months ago 59
Python Question

Fit data to all possible distributions and return the best fit

I have a sample data and I want to get the best fit distribution. I have got couple of links which suggest that I can import the distributions from

scipy.stats
, but then I am not aware of the type of data before hand. I want something similar to
allfitdist()
in
MATLAB
which tries to fit data to around 20 distributions and returns the best fit.

Link for
allfitdist()
: http://www.mathworks.in/matlabcentral/fileexchange/34943-fit-all-valid-parametric-probability-distributions-to-data

Any help is highly appreciable. Thanks.

Answer

You can just create a list of all available distributions in scipy. An example with two distributions and random data:

import numpy as np
import scipy.stats as st


data = np.random.random(10000)
distributions = [st.laplace, st.norm]
mles = []

for distribution in distributions:
    pars = distribution.fit(data)
    mle = distribution.nnlf(pars, data)
    mles.append(mle)

results = [(distribution.name, mle) for distribution, mle in zip(distributions, mles)]
best_fit = sorted(zip(distributions, mles), key=lambda d: d[1])[0]
print 'Best fit reached using {}, MLE value: {}'.format(best_fit[0].name, best_fit[1])
Comments