Diane Diane - 2 months ago 25
Python Question

Python regularise irregular time series with linear interpolation

I have a time series in pandas that looks like this:

Values
1992-08-27 07:46:48 28.0
1992-08-27 08:00:48 28.2
1992-08-27 08:33:48 28.4
1992-08-27 08:43:48 28.8
1992-08-27 08:48:48 29.0
1992-08-27 08:51:48 29.2
1992-08-27 08:53:48 29.6
1992-08-27 08:56:48 29.8
1992-08-27 09:03:48 30.0


I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:

Values
1992-08-27 08:00:00 28.2
1992-08-27 08:15:00 28.3
1992-08-27 08:30:00 28.4
1992-08-27 08:45:00 28.8
1992-08-27 09:00:00 29.9


However using the resample method (df.resample('15Min')) from Pandas I get:

Values
1992-08-27 08:00:00 28.20
1992-08-27 08:15:00 NaN
1992-08-27 08:30:00 28.60
1992-08-27 08:45:00 29.40
1992-08-27 09:00:00 30.00


I have tried the resample method with different 'how' and 'fill_method' parameters but never got exactly the results I wanted. Am I using the wrong method?

I figure this is a fairly simple query, but I have searched the web for a while and couldn't find an answer.

Thanks in advance for any help I can get.

Answer

It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted is used to find dates closest to the resample point.

# empty frame with desired index
rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)

# array of indexes corresponding with closest timestamp after resample
idx_after = np.searchsorted(df.index.values, rs.index.values)

# values and timestamp before/after resample
rs['after'] = df.loc[df.index[idx_after], 'Values'].values
rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values
rs['after_time'] = df.index[idx_after]
rs['before_time'] = df.index[idx_after - 1]

#calculate new weighted value
rs['span'] = (rs['after_time'] - rs['before_time'])
rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']
# I got errors here unless I turn the index to a series
rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']

rs['Values'] = rs.eval('before * before_weight + after * after_weight')

After all that, hopefully the right answer:

In [161]: rs['Values']
Out[161]: 
1992-08-27 08:00:00    28.011429
1992-08-27 08:15:00    28.313939
1992-08-27 08:30:00    28.223030
1992-08-27 08:45:00    28.952000
1992-08-27 09:00:00    29.908571
Freq: 15T, Name: Values, dtype: float64