anshanno anshanno - 3 months ago 16
Python Question

Why can't I apply pandas.DatetimeIndex to multiple columns?

I am trying to drop the time portion on several pandas columns using the following code:

group_df['submitted_on'] = pd.DatetimeIndex(group_df['submitted_on']).to_period('d')
group_df['resolved_on'] = pd.DatetimeIndex(group_df['resolved_on']).to_period('d')


This works fine for the first column but I can't seem to figure out why I cant apply it ti multiple columns.

I am getting the following error attempting to execute the second line:

File "C:/Users/anshanno/PycharmProjects/RETIvizScript/RetiViz.py", line 271, in join_groups
group_df['resolved_on'] = pd.DatetimeIndex(group_df['resolved_on']).to_period('d')
File "C:\Python27\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Python27\lib\site-packages\pandas\tseries\index.py", line 349, in __new__
values, freq=freq, dayfirst=dayfirst, yearfirst=yearfirst)
File "pandas\tslib.pyx", line 2347, in pandas.tslib.parse_str_array_to_datetime (pandas\tslib.c:42450)
ValueError


Since the ValueError isn't telling me anything, I tried
errors='coerce'
without any luck - I still get the same undescriptive error.

group_df['resolved_on'] = pd.DatetimeIndex(group_df['resolved_on'], errors='coerce').to_period('d')


Edit (Sample Data):

"identifier","status","submitted_on","resolved_on","closed_on","duplicate_on","junked_on","unproducible_on","verified_on"
"xx1","D","2004-07-28 07:00:00.0","null","null","2004-08-26 07:00:00.0","null","null","null"
"xx2","N","2010-03-02 03:00:16.0","null","null","null","null","null","null"
"xx3","U","2005-10-26 14:20:20.0","null","null","null","null","2005-11-01 13:02:22.0","null"
"xx4","V","2006-06-30 07:00:00.0","2006-09-15 07:00:00.0","null","null","null","null","2006-11-20 08:00:00.0"
"xx5","R","2012-09-21 06:30:58.0","2013-06-06 09:35:25.0","null","null","null","null","null"
"xx6","D","2009-11-25 02:16:03.0","null","null","2010-02-26 12:28:22.0","null","null","null"
"xx7","D","2003-08-29 07:00:00.0","null","null","2003-08-29 07:00:00.0","null","null","null"
"xx8","R","2003-06-06 12:00:00.0","2003-06-24 12:00:00.0","null","null","null","null","null"
"xx9","R","2004-11-05 08:00:00.0","2004-11-15 08:00:00.0","null","null","null","null","null"
"xx10","R","2008-02-21 05:13:39.0","2008-09-25 17:20:57.0","null","null","null","null","null"
"xx11","R","2007-03-08 17:47:44.0","2007-03-21 23:47:57.0","null","null","null","null","null"
"xx12","R","2011-08-22 19:50:25.0","2012-06-21 05:52:12.0","null","null","null","null","null"
"xx13","J","2003-07-07 12:00:00.0","null","null","null","2003-07-10 12:00:00.0","null","null"
"xx14","A","2008-09-24 11:36:34.0","null","null","null","null","null","null"


Thanks guys, any help is appreciated.

Answer

Use pd.to_datetime instead of pd.DatetimeIndex

group_df['submitted_on'] = pd.to_datetime(group_df['submitted_on'], 'coerce').dt.to_period('d')
group_df['resolved_on'] = pd.to_datetime(group_df['resolved_on'], 'coerce').dt.to_period('d')

group_df

enter image description here