ruser9575ba6f ruser9575ba6f - 2 months ago 24
R Question

untar gzcon in memory in R

In R, how can I untar a

in memory?


I need to perform some operations on a .tar.gz file in memory and it is important that the file never be written to disk. The file is initially downloaded with
and results in an object similar to the example data below.

If I then do
on the object it will write the data in the tarfile to disk, which is undesirable.

Example data (a
containing a file named
with the content
hello world!

res <- structure(list(url = "s",
status_code = 0L, headers = raw(0), modified = structure(1479765215L, class = c("POSIXct",
"POSIXt")), times = structure(c(0, 0, 0, 0, 0.312, 0.312), .Names = c("redirect",
"namelookup", "connect", "pretransfer", "starttransfer",
"total")), content = as.raw(c(0x1f, 0x8b, 0x08, 0x00, 0xdf,
0x6c, 0x33, 0x58, 0x00, 0x03, 0xed, 0xce, 0x3d, 0x0a, 0xc2,
0x50, 0x10, 0xc4, 0xf1, 0xad, 0x73, 0x8a, 0xe7, 0x05, 0x64,
0x37, 0x79, 0xd9, 0x9c, 0x47, 0x30, 0x90, 0xe2, 0x49, 0x20,
0x59, 0x3f, 0x8e, 0xaf, 0x22, 0x42, 0x2a, 0x4d, 0x13, 0x44,
0xf8, 0xff, 0x9a, 0x29, 0x66, 0x8a, 0x89, 0x7e, 0x8e, 0x7d,
0xdc, 0x42, 0x36, 0xa4, 0x0f, 0xee, 0xf9, 0x99, 0xd6, 0xb5,
0xba, 0xcc, 0x17, 0x73, 0xb1, 0x46, 0x2d, 0xbb, 0x7b, 0xa3,
0xad, 0xa8, 0x69, 0xae, 0x3b, 0x49, 0xba, 0xe5, 0xa9, 0xb7,
0xf3, 0x1c, 0x87, 0x29, 0x25, 0xb9, 0x9c, 0x3e, 0xef, 0xbe,
0xf5, 0x7f, 0x6a, 0xe8, 0x4b, 0x19, 0xd3, 0x75, 0x9c, 0xca,
0x71, 0x57, 0x55, 0xbf, 0x7e, 0x03, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x58,
0xeb, 0x0e, 0x02, 0xc4, 0x36, 0xca, 0x00, 0x28, 0x00, 0x00
))), .Names = c("url", "status_code", "headers", "modified",
"times", "content"))


It turns out parsing tarfiles is not that difficult to do. The core loop of utils:::untar2 is a good starting point for the implementation of an in-memory untar tool. Basically, the tarfile has the following structure:

| 512-byte header | file data | 512-byte header | file data |

The tar header format is described in more detail in the GNU manual for tar, and is composed of some file attributes, magic numbers, and a checksum.

The pseudocode for the in-memory untar tool is straightforward:

repeat {
  parse tar header with file attributes
  for each block in file {
    write block to raw connection
  write raw connection and file attributes to file object
  add file to list
return list of files