jdepypere jdepypere - 9 months ago 82
R Question

R as.POSIXct() dropping hours minutes and seconds

I am experimenting with R to analyse some measurement data. I have a .csv file containing more than 2 million lines of measurement. Here is an example:

2014-10-22 21:07:03+00:00,7432442.0
2014-10-22 21:07:21+00:00,7432443.0
2014-10-22 21:07:39+00:00,7432444.0
2014-10-22 21:07:57+00:00,7432445.0
2014-10-22 21:08:15+00:00,7432446.0
2014-10-22 21:08:33+00:00,7432447.0
2014-10-22 21:08:52+00:00,7432448.0
2014-10-22 21:09:10+00:00,7432449.0
2014-10-22 21:09:28+00:00,7432450.0

After reading in the file, I want to convert the time to a correct time, using
. For small files this works fine, but for large files it does not.

I made an example by reading in a big file, creating a copy of a small portion and then unleashing the
on the correct column. I included an image of the file. As you can see, when applying it to the
-variable it does correctl keep the hours, minutes and seconds. However, when applying it to the whole file, only the date is stored. (it also takes a LOT of time (more than 2 minutes))

POSIXct() error

What could cause this anomality? Is it due to some system limits, since I'm running this on my laptop.


On my Windows 7 device I run R 3.1.3 which results in this error. However, on Ubuntu 14.01, running R 3.0.2, the times are kept for the large files. Just noticed there is a newer version (3.2.0) for Windows, will update and check if the issue persists.

Answer Source

You can try the code below.
It will:

  • read datetime type as character instead of factor
  • update by reference

data <- fread("C:/RData/house2_electricity_main.csv")
data[, V1 := as.POSIXct(V1)]

There was a question recently about usage of fasttime::fastPOSIXct instead of as.POSIXct which can additionally speed up.

As for the title question, having POSIXct you can round it quite freely, e.g. functions year,month,mday...

data[, .SD, by = .(year(V1),month(V1),mday(V1))]