Don Don - 3 months ago 25
Python Question

Extract year from YYYYMMDD column in Pandas DataFrame

I have a pandas DataFrame in which I would like to create an additional column containing only the year which I extract from a column in YYYYMMDD format.
When searching the forum I found the

to_datetime
command, but for my case it didn't work.

I tried the following:

df = pd.DataFrame({'name' : ['A','B'],
'date' :[20130102,20140511]})

df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year


what I get as output is:

date name year
0 1970-01-01 00:00:00.020130102 A 1970
1 1970-01-01 00:00:00.020140511 B 1970


but I would like to get:

date name year
0 20130102 A 2013
1 20140511 B 2014


I also tried it without
to_datetime
as my date is not in exactly in the yyyy-mm-dd format, but also couldn't make it that way.
I hope you can help me with this 'newbie' problem, thanks a lot!

Answer

This is what you need, to specify the format in which you're providing the date.

 df['date'] = pd.to_datetime(df['date'],format='%Y%m%d')