doctorer doctorer - 29 days ago 6
Python Question

convert pandas dataframe column to np.datetime64

I want to add a

np.datetime64
column to a pandas dataframe that has been read from a .csv file containing columns for year, month, day, hour and minute and use it as an index. I have combined the separate columns to make a column of datetime strings.

import numpy as np
import pandas as pd
filename = 'test.csv'
df = pd.read_csv(filename, header=0, usecols = [2,3,4,5,6], names = ['y','m','d','h','min'],dtype = {'y':'str','m':'str','d':'str','h':'str','min':'str'}) #read csv file into df
df['datetimetext'] = (df['y']+'-'+df['m']+'-'+df['d']+' '+df['h']+':'+df['min']+':00')


So the dataframe looks like this:

y m d h min datetimetext
0 1993 09 06 00 30 1993-09-06 00:30:00
1 1993 09 06 01 00 1993-09-06 01:00:00
2 1993 09 06 01 30 1993-09-06 01:30:00
3 1993 09 06 02 00 1993-09-06 02:00:00
4 1993 09 06 02 30 1993-09-06 02:30:00
......


Now I want to add a column with the datetime formatted as
np.datetime64


I want to write

df['datetime'] = np.datetime64(df['datetimetext'])


but that creates an error

ValueError: Could not convert object to NumPy datetime


Do I need to iterate through each row of the dataframe, or is there a more elegant solution?

Answer

easiest way with what you have is

df['datetime'] = pd.to_datetime(df['datetimetext'])

LINK TO DOCS

However, if your columns are named appropriately, you can convert directly from the named columns. I renamed your columns using rename

m = dict(y='year', m='month', d='day', h='h', min='m')
# rename columns and get rid of datetimetext
df = df[['y', 'm', 'd', 'h', 'min']].rename(columns=m)
df

enter image description here

I'll next perform the conversion and assign to the index in one shot

df.index = pd.to_datetime(df)
df

enter image description here