vagabond vagabond - 1 month ago 16
R Question

String to Datetime conversion - specifying year in date-month date

I am an R User trying to learn Python.

I am working on a dataframe in Python which has a date column of dtype object.

df['Date']
0 1-Mar
1 1-Mar
2 1-Mar
3 1-Mar
4 1-Mar
5 1-Mar


I tried to convert this object to date time using this:

pd.to_datetime(df['Date'], format = "%d-%b")


The result I got looks like:

0 1900-03-01
1 1900-03-01
2 1900-03-01
3 1900-03-01
4 1900-03-01
5 1900-03-01


It is a little strange for me because when I do this in
R
using:

as.Date(df$Date, format = "%d-%b")


I get what I expect:

[1] "2016-03-01" "2016-03-01" "2016-03-01" "2016-03-01"
[5] "2016-03-01" "2016-03-01"


Two questions arise: 1) Why is
R
assuming I want the current year and what if I don't want the current year?

2) In Python, using Pandas - how do I specify the year I want and also the timezone?

Thanks.

Answer

1) Why is R assuming I want the current year and what if I don't want the current year?

R is assuming because you have kind-of asked it to assume. The thing is: when you provide R with 1-Mar without a year, the returned answer may be system-specific. The most common behaviour is to assume that a missing year, month or day is the current one. Consequently, you are going to get the current year once the conversion has been accomplished. If it just so happens that you do not want the current year, you can tell R so by specifying the year.

2) In Python, using Pandas - how do I specify the year I want and also the timezone?

Again, you really can't expect either Python or R to return some meaningful date to you when you did not provide them with a meaningful string to format. By giving pandas a date string with the year missing, you are leaving it up to the developers of the library to guess what the year should be. In either case, you can force the year to be 2016 by doing the following:

Pandas:

df1 = pd.DataFrame(data = {'Date':['1-Mar']*6})
df1['Date'] = pd.to_datetime(df1['Date']+"-2016",format="%d-%b-%Y")

Yields:

0   2016-03-01
1   2016-03-01
2   2016-03-01
3   2016-03-01
4   2016-03-01
5   2016-03-01

R:

df1 <- data.frame(Date = rep('1-Mar',6))
as.Date(paste(df1$Date,"2016",sep = "-"), format = "%d-%b-%Y")

Yields:

"2016-03-01" "2016-03-01" "2016-03-01" "2016-03-01" "2016-03-01" "2016-03-01"

You can make the year anything you like, but you can't expect the language or library to provide you with the result you so desire. There is a little bit of subjectivity that goes into designing a language or library sometimes. The bottom line is: make sure you are okay and can still work with whatever you are given if your starting date string is not complete.

I hope this helps.