Robin - 6 months ago 114

Python Question

I have two numpy arrays light_points and time_points and would like to use some time series analysis methods on those data.

I then tried this :

`import statsmodels.api as sm`

import pandas as pd

tdf = pd.DataFrame({'time':time_points[:]})

rdf = pd.DataFrame({'light':light_points[:]})

rdf.index = pd.DatetimeIndex(freq='w',start=0,periods=len(rdf.light))

#rdf.index = pd.DatetimeIndex(tdf['time'])

This works but is not doing the correct thing.

Indeed, the measurements are not evenly time-spaced and if I just declare the time_points pandas DataFrame as the index of my frame, I get an error :

`rdf.index = pd.DatetimeIndex(tdf['time'])`

decomp = sm.tsa.seasonal_decompose(rdf)

elif freq is None:

raise ValueError("You must specify a freq or x must be a pandas object with a timeseries index")

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

I don't know how to correct this.

Also, it seems that pandas'

`TimeSeries`

I tried this :

`rdf = pd.Series({'light':light_points[:]})`

rdf.index = pd.DatetimeIndex(tdf['time'])

But it gives me a length mismatch :

`ValueError: Length mismatch: Expected axis has 1 elements, new values have 122 elements`

Nevertheless, I don't understand where it comes from, as rdf['light'] and

tdf['time'] are of same length...

Eventually, I tried by defining my rdf as a pandas Series :

`rdf = pd.Series(light_points[:],index=pd.DatetimeIndex(time_points[:]))`

And I get this :

`ValueError: You must specify a freq or x must be a pandas object with a timeseries index`

Then, I tried instead replacing the index by

`pd.TimeSeries(time_points[:])`

And it gives me an error on the seasonal_decompose method line :

`AttributeError: 'Float64Index' object has no attribute 'inferred_freq'`

How can I work with unevenly spaced data ?

I was thinking about creating an approximately evenly spaced time array by adding many unknown values between the existing values and using interpolation to "evaluate" those points, but I think there could be a cleaner and easier solution.

Answer

`seasonal_decompose()`

requires a `freq`

that is either provided as part of the `DateTimeIndex`

meta information, can be inferred by `pandas.Index.inferred_freq`

or else by the user as an `integer`

that gives the number of periods per cycle. e.g., 12 for monthly (from `docstring`

for `seasonal_mean`

):

`def seasonal_decompose(x, model="additive", filt=None, freq=None): """ Parameters ---------- x : array-like Time series model : str {"additive", "multiplicative"} Type of seasonal component. Abbreviations are accepted. filt : array-like The filter coefficients for filtering out the seasonal component. The default is a symmetric moving average. freq : int, optional Frequency of the series. Must be used if x is not a pandas object with a timeseries index.`

To illustrate - using random sample data:

```
length = 400
x = np.sin(np.arange(length)) * 10 + np.random.randn(length)
df = pd.DataFrame(data=x, index=pd.date_range(start=datetime(2015, 1, 1), periods=length, freq='w'), columns=['value'])
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 400 entries, 2015-01-04 to 2022-08-28
Freq: W-SUN
decomp = sm.tsa.seasonal_decompose(df)
data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1)
data.columns = ['series', 'trend', 'seasonal', 'resid']
Data columns (total 4 columns):
series 400 non-null float64
trend 348 non-null float64
seasonal 400 non-null float64
resid 348 non-null float64
dtypes: float64(4)
memory usage: 15.6 KB
```

So far, so good - now randomly dropping elements from the `DateTimeIndex`

to create unevenly space data:

```
df = df.iloc[np.unique(np.random.randint(low=0, high=length, size=length * .8))]
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 222 entries, 2015-01-11 to 2022-08-21
Data columns (total 1 columns):
value 222 non-null float64
dtypes: float64(1)
memory usage: 3.5 KB
df.index.freq
None
df.index.inferred_freq
None
```

Running the `seasonal_decomp`

on this data 'works':

```
decomp = sm.tsa.seasonal_decompose(df, freq=52)
data = pd.concat([df, decomp.trend, decomp.seasonal, decomp.resid], axis=1)
data.columns = ['series', 'trend', 'seasonal', 'resid']
DatetimeIndex: 224 entries, 2015-01-04 to 2022-08-07
Data columns (total 4 columns):
series 224 non-null float64
trend 172 non-null float64
seasonal 224 non-null float64
resid 172 non-null float64
dtypes: float64(4)
memory usage: 8.8 KB
```

The question is - how useful is the result. Even without gaps in the data that complicate inference of seasonal patterns (see example use of `.interpolate()`

in the release notes, `statsmodels`

qualifies this procedure as follows:

`Notes ----- This is a naive decomposition. More sophisticated methods should be preferred. The additive model is Y[t] = T[t] + S[t] + e[t] The multiplicative model is Y[t] = T[t] * S[t] * e[t] The seasonal component is first removed by applying a convolution filter to the data. The average of this smoothed series for each period is the returned seasonal component.`