BenT BenT - 3 months ago 75
Python Question

Pandas Interpolate dataframe with new length

I have a dataframe with columns of Datetime, lat, lon, z. I am reading the data in from a csv file so setting the period for the datetimes do not work. The times are in 6 hour intervals but I want to linearly interpolate the data to hourly intervals.

Go from

'A' 'B' 'C' 'D'
0 2010-09-13 18:00:00 16.3 -78.5 1
1 2010-09-14 00:00:00 16.6 -79.8 6
2 2010-09-14 06:00:00 17.0 -81.1 12


To

'A' 'B' 'C' 'D'
1 2010-09-13 18:00:00 16.3 -78.5 1
2 2010-09-13 19:00:00 16.35 -78.7 2
3 2010-09-13 20:00:00 16.4 -78.9 3
4 2010-09-13 21:00:00 16.45 -79.1 4
5 2010-09-13 22:00:00 16.5 -79.3 5
....


I have tried using the interpolate command but there are no arguments for a new length of the dataframe.

df.interpolate(method='linear')


I was thinking that I could use .loc to include 5 rows of NANs between each line in the data frame and then use the interpolation function but that seems like a bad workaround.

Solution
Using DatetimeIndex eliminates the association with the other columns if your initial column was not imported as datetime.

i = pd.DatetimeIndex(start=df['A'].min(), end=df['A'].max(), freq='H')
df = df.reindex(i).interpolate()
print(df)


Gives the correct answer.

Answer
i = pd.DatetimeIndex(start=df.index.min(), end=df.index.max(), freq='H')
df = df.reindex(i).interpolate()
print(df)

outputs

2010-09-13 18:00:00  16.300000 -78.500000
2010-09-13 19:00:00  16.350000 -78.716667
2010-09-13 20:00:00  16.400000 -78.933333
2010-09-13 21:00:00  16.450000 -79.150000
2010-09-13 22:00:00  16.500000 -79.366667
  1. Create a new index with the desired frequency using DatetimeIndex (docs).

  2. reindex (docs) with this new index. By default values for new indices will be np.nan.

  3. interpolate (docs) to fill in these missing values. You can supply the method kwarg to determine how interpolation is done.

Comments