Max Max - 2 months ago 7
R Question

How to read quoted text containing escaped quotes

Consider the following comma separated file. For simplicity let it contain one line:




'I am quoted','so, can use comma inside - it is not separator here','but can\'t use escaped quote :=('





If you try to read it with the command

table <- read.csv(filename, header=FALSE)


the line will be separated to 4 parts, because line contains 3 commas. In fact I want to read only 3 parts, one of which contains comma itself. There quote flag comes for help. I tried:

table <- read.csv(filename, header=FALSE, quote="'")


but that falls with error
"incomplete final line found by readTableHeader on table"
. That happens because of odd (seven) number of quotes.

read.table()
as well as
scan()
have parameter
allowEscapes
, but setting it to
TRUE
doesn't help. It is ok, cause from
help(scan)
you can read:


The escapes which are interpreted are the control characters
ā€˜\a, \b, \f, \n, \r, \t, \vā€™, ...
... Any other escaped
character is treated as itself, including backslash


Please suggest how would you read such quoted csv-files, containing escaped
\'
quotes.

Answer

One possibility is to use readLines() to get everything read in as is, and then proceed by replacing the quote character by something else, eg :

tt <- readLines("F:/temp/test.txt")
tt <- gsub("([^\\]|^)'","\\1\"",tt) # replace ' by "
tt <- gsub("\\\\","\\",tt) # get rid of the double escape due to readLines

This allows you to read the vector tt in using a textConnection

zz <- textConnection(tt)
read.csv(zz,header=F,quote="\"") # give text input
close(zz)

Not the most beautiful solution, but it works (provided you don't have a " character somewhere in the file off course...)