Max Max - 1 year ago 43
R Question

How to read quoted text containing escaped quotes

Consider the following comma separated file. For simplicity let it contain one line:

'I am quoted','so, can use comma inside - it is not separator here','but can\'t use escaped quote :=('

If you try to read it with the command

table <- read.csv(filename, header=FALSE)

the line will be separated to 4 parts, because line contains 3 commas. In fact I want to read only 3 parts, one of which contains comma itself. There quote flag comes for help. I tried:

table <- read.csv(filename, header=FALSE, quote="'")

but that falls with error
"incomplete final line found by readTableHeader on table"
. That happens because of odd (seven) number of quotes.

as well as
have parameter
, but setting it to
doesn't help. It is ok, cause from
you can read:

The escapes which are interpreted are the control characters
ā€˜\a, \b, \f, \n, \r, \t, \vā€™, ...
... Any other escaped
character is treated as itself, including backslash

Please suggest how would you read such quoted csv-files, containing escaped

Answer Source

One possibility is to use readLines() to get everything read in as is, and then proceed by replacing the quote character by something else, eg :

tt <- readLines("F:/temp/test.txt")
tt <- gsub("([^\\]|^)'","\\1\"",tt) # replace ' by "
tt <- gsub("\\\\","\\",tt) # get rid of the double escape due to readLines

This allows you to read the vector tt in using a textConnection

zz <- textConnection(tt)
read.csv(zz,header=F,quote="\"") # give text input

Not the most beautiful solution, but it works (provided you don't have a " character somewhere in the file off course...)