Diane - 10 months ago 108

Python Question

I have a time series in pandas that looks like this:

`Values`

1992-08-27 07:46:48 28.0

1992-08-27 08:00:48 28.2

1992-08-27 08:33:48 28.4

1992-08-27 08:43:48 28.8

1992-08-27 08:48:48 29.0

1992-08-27 08:51:48 29.2

1992-08-27 08:53:48 29.6

1992-08-27 08:56:48 29.8

1992-08-27 09:03:48 30.0

I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:

`Values`

1992-08-27 08:00:00 28.2

1992-08-27 08:15:00 28.3

1992-08-27 08:30:00 28.4

1992-08-27 08:45:00 28.8

1992-08-27 09:00:00 29.9

However using the resample method (df.resample('15Min')) from Pandas I get:

`Values`

1992-08-27 08:00:00 28.20

1992-08-27 08:15:00 NaN

1992-08-27 08:30:00 28.60

1992-08-27 08:45:00 29.40

1992-08-27 09:00:00 30.00

I have tried the resample method with different 'how' and 'fill_method' parameters but never got exactly the results I wanted. Am I using the wrong method?

I figure this is a fairly simple query, but I have searched the web for a while and couldn't find an answer.

Thanks in advance for any help I can get.

Answer Source

It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. `np.searchsorted`

is used to find dates closest to the resample point.

```
# empty frame with desired index
rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)
# array of indexes corresponding with closest timestamp after resample
idx_after = np.searchsorted(df.index.values, rs.index.values)
# values and timestamp before/after resample
rs['after'] = df.loc[df.index[idx_after], 'Values'].values
rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values
rs['after_time'] = df.index[idx_after]
rs['before_time'] = df.index[idx_after - 1]
#calculate new weighted value
rs['span'] = (rs['after_time'] - rs['before_time'])
rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']
# I got errors here unless I turn the index to a series
rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']
rs['Values'] = rs.eval('before * before_weight + after * after_weight')
```

After all that, hopefully the right answer:

```
In [161]: rs['Values']
Out[161]:
1992-08-27 08:00:00 28.011429
1992-08-27 08:15:00 28.313939
1992-08-27 08:30:00 28.223030
1992-08-27 08:45:00 28.952000
1992-08-27 09:00:00 29.908571
Freq: 15T, Name: Values, dtype: float64
```