Jon Clements Jon Clements - 1 year ago 147
Python Question

Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

I have a

which has an automatically generated index, with a column

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

What I'd like to do is create a new column truncated to hour precision. I'm currently using:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month,, L.hour))

This works, so that's fine. However, I've an inkling there's some nice way using
or creating a
or similar.

So if possible, is there some
wizardry to do this?

Answer Source

Here's one alternative method - it doesn't use any of Pandas' functionality (so might not be completely in the spirit of your question), but it is fast and concise.

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:


This truncates everything to hour precision. For example:

>>> df
0  2014-10-01 10:02:45
1  2014-10-01 13:08:17
2  2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                   dt                 dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

The same method should work for any other unit: months 'M', minutes 'm', and so on.

(The only route within Pandas I could see was to do something like pd.DatetimeIndex(df['dt']).to_period('h') which seems heavy in comparison and changes the dtype of the column.)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download