Daniel Campos Daniel Campos - 2 months ago 19
R Question

How to keep downloaded file (from FTP) timestamp with R?

I am currently trying to download files over FTP (with R), but I want to keep the source timestamp (last modified date).

I know that download.file (from {base} R) can be used with some extras and I saw on the web that

-R
or
--remote-time
should do the trick. But the code I have written does keep the modified date as the date (and time) of download.

download.file(url = "ftp://ftp.datasus.gov.br/dissemin/publicos/SIASUS/200801_/Dados/ABAC1502.dbc",
destfile = "C:/LocalPath/ABAC1502.dbc",
quiet = T,
mode = 'wb',
method = "libcurl",
extra = "--remote-time")


Am I missing something here?

I have also tried it on other FTP servers with no success.

More details: RStudio v0.99.484, R v3.3.1 (x64), OS Windows 7 Enterprise SP1

Answer

I couldn't get that site to load and I think your problem is solved with the switch to curl from libcurl in extra, but this is a more generic solution (tested on macOS & Windows) that I tested with a known working FTP site:

library(curl)
library(Rcpp)
library(inline)

h <- new_handle()
handle_setopt(h, filetime=TRUE, verbose=TRUE) # verbose is just for my debugging
h <- curl_fetch_disk("ftp://ftp.ngdc.noaa.gov/STP/SOLAR_DATA/AIRGLOW/IGYDATA/abst5270",
                      "abst5270", h)

h$modified
## [1] "1999-10-22 18:59:10 EDT"

as.numeric(h$modified)
## [1] 940633150

set_modtime <- rcpp(sig=c(path="character", ts="integer"), body=
" struct stat f_stat;
  struct utimbuf ftp_time;
  std::string file_path = as<std::string>(path);
  long file_ts = as<long>(ts);
  if (stat(file_path.c_str(), &f_stat) >= 0) {
    ftp_time.actime = f_stat.st_atime;
    ftp_time.modtime = file_ts;
    utime(file_path.c_str(), &ftp_time);
  }
", includes=c("#include <time.h>", "#include <utime.h>", "#include <sys/stat.h>"))

# Changes it to way back in the past
invisible(set_modtime("abst5270", as.numeric(h$modified)))

# Changes it back to right now
invisible(set_modtime("abst5270", as.numeric(Sys.time())))

It would need some extra checking and exception handling in a package but this shld work fine in a script.

NOTE that you have to use either a full path or accessible working relative path (that may be obvious, but I wanted to make sure it was explained).

Comments