harbun harbun - 7 months ago 57
Python Question

Setting datetime64 series as pandas dataframe index automatically adds timezone offset

I am reading an csv with datetimes without timezone data, but once I use the datetime column as index, a(n incorrect) timezone offset is being added. How can I prevent this from happening?

The data:

Time (UTC),Open,High,Low,Close,Volume
2005.01.03 00:00:00,1.8275,1.858,1.7971,1.819,41998.5
2005.01.10 00:00:00,1.8095,1.8376,1.771,1.766,46353.9

It is weekly OHLC data.

import pandas as pd
df = pd.read_csv("test.csv", parse_dates=["Time (UTC)"])

After reading in the data, there is no timezone offset:

df["Time (UTC)"].head(2)
0 1973-02-26
1 1973-03-05
Name: Time (UTC), dtype: datetime64[ns]

But when I set this data as index, a timezone offset is added:

df.index = df["Time (UTC)"]
array(['1973-02-26T01:00:00.000000000+0100'], dtype='datetime64[ns]')

, I get back that
, so there is no timezone added even though there is a timezone offset added (which, by the way seems to have summertime too). If I set the timezone to UTC with
df = df.tz_localize("UTC")
shows me dtype='datetime64[ns, UTC]'. However, it has no effect on the offsets.

Since I know what timezone the data is in, I don't need an timezone offset, much less an incorrect one probably based on my machines timezone.
I would rather have ["Time (UTC)"] column set as index upon using pd.read_csv for performance reasons, but I get the same behavior when doing that.

How can I prevent an timezone offset of being added, or set the correct one?

My python version is 2.7.11 (Anaconda 2.5.0 64 Bit), pandas version is 0.17.1, numpy 1.10.4.


This is solely a display issue - your dates are still timezone-naive, it's just that numpy displays an offset in the repr.

If you upgrade to a more recent numpy (1.11+), it will fix the display issue.

In [31]: np.__version__
Out[31]: '1.11.1'

In [32]: df.index.values[:1]
Out[32]: array(['2005-01-03T00:00:00.000000000'], dtype='datetime64[ns]')