harbun harbun - 12 days ago 6
Python Question

Setting datetime64 series as pandas dataframe index automatically adds timezone offset

I am reading an csv with datetimes without timezone data, but once I use the datetime column as index, a(n incorrect) timezone offset is being added. How can I prevent this from happening?

The data:

Time (UTC),Open,High,Low,Close,Volume
2005.01.03 00:00:00,1.8275,1.858,1.7971,1.819,41998.5
2005.01.10 00:00:00,1.8095,1.8376,1.771,1.766,46353.9


It is weekly OHLC data.

import pandas as pd
df = pd.read_csv("test.csv", parse_dates=["Time (UTC)"])


After reading in the data, there is no timezone offset:

in:
df["Time (UTC)"].head(2)
out:
0 1973-02-26
1 1973-03-05
Name: Time (UTC), dtype: datetime64[ns]


But when I set this data as index, a timezone offset is added:

in:
df.index = df["Time (UTC)"]
df.index.values[:1]
out:
array(['1973-02-26T01:00:00.000000000+0100'], dtype='datetime64[ns]')


using
df.index
, I get back that
dtype='datetime64[ns]'
, so there is no timezone added even though there is a timezone offset added (which, by the way seems to have summertime too). If I set the timezone to UTC with
df = df.tz_localize("UTC")
,
df.index
shows me dtype='datetime64[ns, UTC]'. However, it has no effect on the offsets.

Since I know what timezone the data is in, I don't need an timezone offset, much less an incorrect one probably based on my machines timezone.
I would rather have ["Time (UTC)"] column set as index upon using pd.read_csv for performance reasons, but I get the same behavior when doing that.

How can I prevent an timezone offset of being added, or set the correct one?

My python version is 2.7.11 (Anaconda 2.5.0 64 Bit), pandas version is 0.17.1, numpy 1.10.4.

Answer

This is solely a display issue - your dates are still timezone-naive, it's just that numpy displays an offset in the repr.

If you upgrade to a more recent numpy (1.11+), it will fix the display issue.

In [31]: np.__version__
Out[31]: '1.11.1'

In [32]: df.index.values[:1]
Out[32]: array(['2005-01-03T00:00:00.000000000'], dtype='datetime64[ns]')