Jon Clements Jon Clements - 29 days ago 9
Python Question

Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

I have a

pandas.DataFrame
called
df
which has an automatically generated index, with a column
dt
:

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))


What I'd like to do is create a new column truncated to hour precision. I'm currently using:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))


This works, so that's fine. However, I've an inkling there's some nice way using
pandas.tseries.offsets
or creating a
DatetimeIndex
or similar.

So if possible, is there some
pandas
wizardry to do this?

Answer

Here's one alternative method - it doesn't use any of Pandas' functionality (so might not be completely in the spirit of your question), but it is fast and concise.

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:

df['dt'].values.astype('<M8[h]')

This truncates everything to hour precision. For example:

>>> df
                    dt
0  2014-10-01 10:02:45
1  2014-10-01 13:08:17
2  2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                   dt                 dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

The same method should work for any other unit: months 'M', minutes 'm', and so on.

(The only route within Pandas I could see was to do something like pd.DatetimeIndex(df['dt']).to_period('h') which seems heavy in comparison and changes the dtype of the column.)

Comments