EpiBlake EpiBlake - 2 months ago 10
R Question

Checking if a tsv file that has a comment header is otherwise empty

I have a program that generates tab separated value files (*.tsv) as a part of a larger analysis pipeline. These tsv files have four comment header lines, each beginning with a # (example below with the line prefix and a description of what follows it).

#Version ProgramNameAndVersion
#CL CommandListing
#TEMPLATE-SDF-ID FileName
#abundance IgnoredHeaderNames


After this header, there is often a large number of data rows that I can read in and manipulate without problem. Rarely though, the program will write out a tsv that still has the comment header lines, but is otherwise empty. I am looking for a good way to check if the tsv file will be empty before i try to import it and get a "no lines available in input" error.

Normally, I would just use:

info=file.info(ListOfFileNames)
empty = rownames(info[info$size == 0, ])


As is described here. But the tsv files are not truly empty, just empty of data. Additionally, I can't just move the cutoff for size to a different distinct value because the details contained within the header lines change from file to file and I have found "empty" files with a larger file size than a file that had a single line of data.

I would appreciate any help on a way of checking if these files do not contain any data in addition to the # header lines.

Answer

A few ways come to mind:

  • check the 5th line of the file before trying to read it in:

    length(readLines(filename, n = 5)[-(1:4)]) > 0
    
  • use readr, this still may give you an empty data.frame (tibble, actually) but no error:

    readr::read_delim(filename, delim="\t", comment="#")
    
  • catch the error (perhaps overkill, perhaps better done using withCallingHandlers):

    tryCatch(read.delim(filename, sep="\t", comment.char="#"),
             error = function(e) {
               if (grepl("no lines available in input", e)) {
                 return(data.frame())
               } else {
                 stop(e)
               }
             })