durbachit - 1 year ago 197

Python Question

I have a dataframe like this (just much bigger, with smaller step of x):

`x val1 val2 val3`

0 0.0 10.0 NaN NaN

1 0.5 10.5 NaN NaN

2 1.0 11.0 NaN NaN

3 1.5 11.5 NaN 11.60

4 2.0 12.0 NaN 12.08

5 2.5 12.5 12.2 12.56

6 3.0 13.0 19.8 13.04

7 3.5 13.5 13.3 13.52

8 4.0 14.0 19.8 14.00

9 4.5 14.5 14.4 14.48

10 5.0 NaN 19.8 14.96

11 5.5 15.5 15.5 15.44

12 6.0 16.0 19.8 15.92

13 6.5 16.5 16.6 16.40

14 7.0 17.0 19.8 18.00

15 7.5 17.5 17.7 NaN

16 8.0 18.0 19.8 NaN

17 8.5 18.5 18.8 NaN

18 9.0 19.0 19.8 NaN

19 9.5 19.5 19.9 NaN

20 10.0 20.0 19.8 NaN

My original issue was calculating derivatives for each of the columns and it was resolved in this question: How to get indexes of values in a Pandas DataFrame?

The solution posted by Alexander was with my previous code as follows:

`import pandas as pd`

import numpy as np

df = pd.read_csv('H:/DocumentsRedir/pokus/dataframe.csv', delimiter=',')

vals = list(df.columns.values)[1:]

dVal = df.iloc[:, 1:].diff() # `x` is in column 0.

dX = df['x'].diff()

dVal.apply(lambda series: series / dX)

However, I need to do some smoothing (let's say to 2 m here, from the original 0.5 m spacing of x), because the values of the derivatives just get crazy at the fine scale.

I have tried the

(This is how I modified the code:

`step = 0.5`

relevant_scale = 2

order_butterworth = 4

b, a = butter(order_butterworth, step/relevant_scale, btype='low', analog=False)

smoothed=filtfilt(b,a,data.iloc[:, 1:]) # the first column is x

dVal = smoothed.diff()

dz = data['Depth'].diff()

derivative = (dVal.apply(lambda series: series / dz))*1000

But my resulting smoothed was an array of NaNs and got an error

`AttributeError: 'numpy.ndarray' object has no attribute 'diff'`

This problem was solved by the answer - http://stackoverflow.com/a/38691551/5553319 and the code really works on continuous data. However, what happens with the hardly noticeable change which I made in the source data? (A NaN value in the middle.)

So how can we make this solution stable even in the case we miss a datapoint in an otherwise continuous array of data?

Ok, also answered in the comments. Such missing datapoints need to be interpolated.

Answer Source

The error you are seeing is because you are trying to call the method `.diff()`

on the result of `filtfilt`

, which is a numpy array which doesn't have that method. If you really want to use a first order difference, you can just use `np.gradient(smoothed)`

Now, it appears that your real goal is to obtain a lag-free estimate of the derivative of a noisy signal. I would recommend that you rather use something like the Savitzky Golay filter which will allow you to get the derivative estimate in one application of the filter. You can see an example of derivative estimation on a noisy signal here

You will also need to accomodate the `NaN`

s in your data. Here is how I would do it with your data:

```
import scipy.signal
import matplotlib.pyplot as plt
# Intelligent use of the index allows us to keep track of the x for the data.
df = df.set_index('x')
dx = df.index[1]
for col in df:
# Get rid of nans
# NOTE: If you have nans in between your data points, this does the wrong thing,
# but for the data you show for contiguous data this is fine.
nonans = df[col].dropna()
smoothed = scipy.signal.savgol_filter(nonans, 5, 2, deriv=1, delta=dx)
plt.plot(nonans.index, smoothed, label=col)
plt.legend()
```

This results in the following figure: