xplodnow xplodnow - 17 days ago 4
Python Question

Curve fitting with large number of data points

this is quite a specific problem I was hoping the community could help me out with. Thanks in advance.

So I have 2 sets of data, one is experimental and the other is based off of an equation. I am trying to fit my data points to this curve and hence obtain the missing variables I am interested in. Namely, a and b in the Ebfit function.

Here is the code:

%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as spys
from scipy.optimize import curve_fit

time = [60,220,520,1840]
Moment = [0.64227262,0.468318916,0.197100772,0.104512508]

Temperature = 25 # Bake temperature in degrees C
Nb = len(Moment) # Number of bake measurements
Baketime_a = time #[s]
N_Device = 10000 # No. of devices considered in the array
T_ambient = 273 + Temperature
kt = 0.0256*(T_ambient/298) # In units of eV
f0 = 1e9 # Attempt frequency


def Ebfit(x,a,b):
Eb_mean = a*(0.0256/kt) # Eb at bake temperature
Eb_sigma = b*Eb_mean
Foursigma = 4*Eb_sigma
Eb_a = np.linspace(Eb_mean-Foursigma,Eb_mean+Foursigma,N_Device)
dEb = Eb_a[1] - Eb_a[0]
pdfEb_a = spys.norm.pdf(Eb_a,Eb_mean,Eb_sigma)

## Retention Time

DMom = np.zeros(len(x),float)
tau = (1/f0)*np.exp(Eb_a)
for bb in range(len(x)):
DMom[bb]= (1 - 2*(sum(pdfEb_a*(1 - np.exp(np.divide(-x[bb],tau))))*dEb))
return DMom

a = 30
b = 0.10

params,extras = curve_fit(Ebfit,time,Moment)

x_new = list(range(0,2000,1))
y_new = Ebfit(x_new,params[0],params[1])

plt.plot(time,Moment, 'o', label = 'data points')
plt.plot(x_new,y_new, label = 'fitted curve')
plt.legend()


The main problem I am having is that the fitting of the data to the function does not work when I use large number of points. In the above code When I use the 4 points (time & moment), this code works fine.

I get the following values for a and b.

array([ 29.11832766, 0.13918353])

The expected values for a is (23-50) and b is (0.06 - 0.15). So these values are within the acceptable range. This is the corresponding plot: Plot 1

However, when I use my actual experimental normalized data with about 500 points.

EDIT: This data:

Normalized Data

https://www.dropbox.com/s/64zke4wckxc1r75/Normalized%20Data.csv?dl=0

Raw Data

https://www.dropbox.com/s/ojgse5ibp59r8nw/Data1.csv?dl=0

I get the following values and plot for a and b which are out of the acceptable range,

array([-13.76687781, -12.90494196])
Plot 2

I know these values are wrong and if I were to do it manually (slowly adjusting values to obtain the proper fit) it would be around a=30.1 and b=0.09. And when plotted looks as such: Plot 3

I have tried changing the initial guess values for a & b, other sets of experimental data as well and other suggestions in similar threads. None seem to work for me. Any help you can provide is appreciated. Thanks.

.
.
.
.

ADDITIONAL INFORMATION

The model I am trying to fit the data to comes from the following equation:
Equation

where Dmom = 1 - 2*Psw

a is the Eb value while b is the Sigma value where, Eb has a range of values determined by the probability density function and 4 times of the sigma values (i.e. Foursigma). This distribution is then summed up to use for the final equation.

Answer

It looks like you do need to play around with the initial guesses for a and b after all. Perhaps the function you're fitting is not very well behaved, which is why it's so prone to fail for intitial guesses away from the global minumum. That being said, here's a working example of how to fit your data:

import pandas as pd
data_df = pd.read_csv('data.csv')
time = data_df['Time since start, Time [s]'].values
moment = data_df['Signal X direction, Moment [emu]'].values

params, extras = curve_fit(Ebfit, time, moment, p0=[40, 0.3])

Yields the values of a and b of:

In [6]: params
Out[6]: array([ 30.47553689,   0.08839412])

Which results in a nicely aligned fit of a function.

x_big = np.linspace(1, 1800, 2000)
y_big = Ebfit(x_big, params[0], params[1])

plt.plot(time, moment, 'o', alpha=0.5, label='all points')
plt.plot(x_big, y_big, label = 'fitted curve')
plt.legend()
plt.show()

fits