Graham Slick Graham Slick - 2 months ago 8
Python Question

Filtering / smoothing a step function to retrieve biggest increments

I have a pandas Series where the index are date time.

I can plot my function with the

step()
function which plots each point of the Series relatively to the time (x is the time).

I want a less precise approach of the evolution in time. So I need to reduce the number of steps, and ignore the smallests increments.
enter image description here
The only way I found is to use the
poly1d()
function from numpy to interpolate the points as a polynomial, and then to step the function. Unfortunately I am loosing the time index during the transformation because the index of a polynomial are x.

Is there a way to ‘simplify’ my function to only get the dates (x values) of the biggest changes on the y axis instead of having all the dates for any change ?
As I wrote above, I'd like to have only the biggest increments and not the small changes.

Here is the exact data:

2016-01-02 -5.418440
2016-01-09 -9.137942
2016-01-16 -9.137942
2016-01-23 -9.137942
2016-01-30 -9.137942
2016-02-06 -11.795107
2016-02-13 -11.795107
2016-02-20 -11.795107
2016-02-27 -11.795107
2016-03-05 -11.795107
2016-03-12 -13.106988
2016-03-19 -13.106988
2016-03-26 -13.106988
2016-04-02 -13.106988
2016-04-09 -13.106988
2016-04-16 -13.106988
2016-04-23 -13.106988
2016-04-30 -11.458878
2016-05-07 0.051123
2016-05-14 2.010179
2016-05-21 -3.210870
2016-05-28 -0.726291
2016-06-04 5.841818
2016-06-11 5.067061
2016-06-18 5.789375
2016-06-25 16.455159
2016-07-02 22.518294
2016-07-09 39.834977
2016-07-16 54.685965
2016-07-23 54.685965
2016-07-30 55.169290
2016-08-06 55.169290
2016-08-13 55.169290
2016-08-20 53.366569
2016-08-27 45.758675
2016-09-03 10.976592
2016-09-10 -0.554887
2016-09-17 -8.653451
2016-09-24 -18.198305
2016-10-01 -22.218711
2016-10-08 -21.158434
2016-10-15 -11.723798
2016-10-22 -9.928957
2016-10-29 -17.498315
2016-11-05 -22.850454
2016-11-12 -25.190656
2016-11-19 -27.250960
2016-11-26 -27.250960
2016-12-03 -27.250960
2016-12-10 -27.250960

Answer

One way is to create a mask from your original series where the absolute difference in value from the previous value in the series is compared against your sensitivity threshold. The mask is simply a boolean selection array (matrix) for filtering your original series.

#my_series is your Series
threshold = 10.0
diff_series = my_series.diff.abs()
mask = diff_series > threshold
#now plot the masked values only or create new series from it etc.
my_series[mask].plot()