Erebus Erebus - 10 months ago 129
R Question

Factor to date conversion produces NA

I'm working on a kaggle dataset and here's some sample code:

Before:

Date Open High Low Close Volume Adj.Close
1 6/29/2010 19.00 25.00 17.54 23.89 18766300 23.89
2 6/30/2010 25.79 30.42 23.30 23.83 17187100 23.83
3 7/1/2010 25.00 25.92 20.27 21.96 8218800 21.96
4 7/2/2010 23.00 23.10 18.71 19.20 5139800 19.20
5 7/6/2010 20.00 20.00 15.83 16.11 6866900 16.11
6 7/7/2010 16.40 16.63 14.98 15.80 6921700 15.80


Now here's the class of each column from left to right (factor, numeric, numeric, numeric, numeric, integer, numeric)

I applied this line of code to change my date(factor) column to a date type

data$Date <- as.Date(data$Date, format = "%d/%m/%Y")


now when I did that I ran "sapply(data, class)" again and a "is.factor(data$Date)" to check and it worked!
But here's the problem:

Date Open High Low Close Volume Adj.Close
1 <NA> 19.00 25.00 17.54 23.89 18766300 23.89
2 <NA> 25.79 30.42 23.30 23.83 17187100 23.83
3 2010-01-07 25.00 25.92 20.27 21.96 8218800 21.96
4 2010-02-07 23.00 23.10 18.71 19.20 5139800 19.20
5 2010-06-07 20.00 20.00 15.83 16.11 6866900 16.11
6 2010-07-07 16.40 16.63 14.98 15.80 6921700 15.80


My dataset 1692x7 and I did an NA count and now I got 1021 NA's (60% of the data)

Anyone know a better method converting factors to date types without all the NA's?

Answer Source

You need to use as.Date(df$Date, format = "%m/%d/%Y") instead of as.Date(data$Date, format = "%d/%m/%Y").

as.Date(df$Date, format = "%m/%d/%Y")
# [1] "2010-06-29" "2010-06-30" "2010-07-01" "2010-07-02" "2010-07-06"
# [6] "2010-07-07"

It's month/day/year not day/month/year

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download