jb. - 8 months ago 110

Python Question

I'd like to be able to perform fits that allows me to fit an arbitrary curve function to data, and allows me to set arbitrary bounds on parameters, for example I want to fit function:

`f(x) = a1(x-a2)^a3\cdot\exp(-\a4*x^a5)`

and say:

- is in following range:
`a2`

`(-1, 1)`

- and
`a3`

are positive`a5`

There is nice scipy curve_fit function, but it doesn't allow to specify parameter bounds. There also is nice http://code.google.com/p/pyminuit/ library that does generic minimalization, and it allows to set bounds on parameters, but in my case it did not coverge.

Answer

As already mentioned by Rob Falck, you could use, for example, the scipy nonlinear optimization routines in scipy.minimize to minimize an arbitrary error function, e.g. the mean squared error.

Note that the function you gave does not necessarily have real values - maybe this was the reason your minimization in pyminuit did not converge. You have to treat this a little more explicitly, see example 2.

The examples below both use the `L-BFGS-B`

minimization method, which supports bounded parameter regions. I split this answer in two parts:

- A function with real codomain, resembling the one given by you. I added absolutes to ensure the function you gave returns real numbers in the domain [-3,3)
- The actual function you gave, which has a complex codomain

The example below shows optimization of this slightly modified version of your function.

```
import numpy as np
import pylab as pl
from scipy.optimize import minimize
points = 500
xlim = 3.
def f(x,*p):
a1,a2,a3,a4,a5 = p
return a1*np.abs(x-a2)**a3 * np.exp(-a4 * np.abs(x)**a5)
# generate noisy data with known coefficients
p0 = [1.4,-.8,1.1,1.2,2.2]
x = (np.random.rand(points) * 2. - 1.) * xlim
x.sort()
y = f(x,*p0)
y_noise = y + np.random.randn(points) * .05
# mean squared error wrt. noisy data as a function of the parameters
err = lambda p: np.mean((f(x,*p)-y_noise)**2)
# bounded optimization using scipy.minimize
p_init = [1.,-1.,.5,.5,2.]
p_opt = minimize(
err, # minimize wrt to the noisy data
p_init,
bounds=[(None,None),(-1,1),(None,None),(0,None),(None,None)], # set the bounds
method="L-BFGS-B" # this method supports bounds
).x
# plot everything
pl.scatter(x, y_noise, alpha=.2, label="f + noise")
pl.plot(x, y, c='#000000', lw=2., label="f")
pl.plot(x, f(x,*p_opt) ,'--', c='r', lw=2., label="fitted f")
pl.xlabel("x")
pl.ylabel("f(x)")
pl.legend(loc="best")
pl.xlim([-xlim*1.01,xlim*1.01])
pl.show()
```

Extension of the above minimization to the complex domain can be done by explicitly casting to complex numbers and adapting the error function:

First, you cast explicitly the value x to complex-valued to ensure f returns complex values and can actually compute fractional exponents of negative numbers. Second, we compute some error function on both real and imaginary parts - a straightforward candidate is the mean of the squared complex absolutes.

```
import numpy as np
import pylab as pl
from scipy.optimize import minimize
points = 500
xlim = 3.
def f(x,*p):
a1,a2,a3,a4,a5 = p
x = x.astype(complex) # cast x explicitly to complex, to ensure complex valued f
return a1*(x-a2)**a3 * np.exp(-a4 * x**a5)
# generate noisy data with known coefficients
p0 = [1.4,-.8,1.1,1.2,2.2]
x = (np.random.rand(points) * 2. - 1.) * xlim
x.sort()
y = f(x,*p0)
y_noise = y + np.random.randn(points) * .05 + np.random.randn(points) * 1j*.05
# error function chosen as mean of squared absolutes
err = lambda p: np.mean(np.abs(f(x,*p)-y_noise)**2)
# bounded optimization using scipy.minimize
p_init = [1.,-1.,.5,.5,2.]
p_opt = minimize(
err, # minimize wrt to the noisy data
p_init,
bounds=[(None,None),(-1,1),(None,None),(0,None),(None,None)], # set the bounds
method="L-BFGS-B" # this method supports bounds
).x
# plot everything
pl.scatter(x, np.real(y_noise), c='b',alpha=.2, label="re(f) + noise")
pl.scatter(x, np.imag(y_noise), c='r',alpha=.2, label="im(f) + noise")
pl.plot(x, np.real(y), c='b', lw=1., label="re(f)")
pl.plot(x, np.imag(y), c='r', lw=1., label="im(f)")
pl.plot(x, np.real(f(x,*p_opt)) ,'--', c='b', lw=2.5, label="fitted re(f)")
pl.plot(x, np.imag(f(x,*p_opt)) ,'--', c='r', lw=2.5, label="fitted im(f)")
pl.xlabel("x")
pl.ylabel("f(x)")
pl.legend(loc="best")
pl.xlim([-xlim*1.01,xlim*1.01])
pl.show()
```

It seems the minimizer might be a little sensitive to initial values - I therefore placed my first guess (p_init) not too far away from the optimum. Should you have to fight with this, you can use the same minimization procedure in addition to a global optimization loop, e.g. basin-hopping or brute.