Daniel Winkler Daniel Winkler - 2 months ago 15
R Question

Read huge .csv file with some columns in single quotes but not all with fread from the data.table package

I apologize that I cannot really create a reproducible example (or I guess at least not according to the rules) but still hope for help.
I am using the data from here:
American Housing Survey 2013 data

Since the data files are quite big I would like to use the "fread" command instead of the "read.csv" command. With read.csv I could just do the following:

homimp <- read.csv("homimp.csv", quotes = "'")
head(homimp)
CONTROL RAS RAH RAD JRAS JRAD
1 100003130103 74 2 96 -9 9
2 100006110249 35 2 8358 -9 9
3 100006110249 36 2 5970 -9 9
4 100006110249 37 2 6567 -9 9
5 100006110249 40 2 716 -9 9
6 100006110249 45 2 1910 -9 9


and it would remove the quotes (note that one column (RAD) is not in quotes in the first place)
However, if I read with fread I do not seem to be able to remove the quotes
The quote argument returns an error:

homimpdt <- fread("homimp.csv", quote = "'")
Error in fread("homimp.csv", quote = "'") : unused argument (quote = "'")


And without the argument quotes are not removed:

homimpdt <- fread("homimp.csv")
head(homimpdt)
CONTROL RAS RAH RAD JRAS JRAD
1: '100003130103' '74' '2' 96 '-9' '9'
2: '100006110249' '35' '2' 8358 '-9' '9'
3: '100006110249' '36' '2' 5970 '-9' '9'
4: '100006110249' '37' '2' 6567 '-9' '9'
5: '100006110249' '40' '2' 716 '-9' '9'
6: '100006110249' '45' '2' 1910 '-9' '9'


Why I want to do this:

> system.time(newhouse <- read.csv('newhouse.csv', quote = "'"))
user system elapsed
24.86 0.68 25.77
> system.time(newhousedt <- fread('newhouse.csv'))
Read 84355 rows and 760 (of 760) columns from 0.273 GB file in 00:00:04
user system elapsed
3.33 0.07 3.41


Thank you very much for your help!

Ad Psidom's comment:

homimpdt <- fread("homimp.csv", quote = "\'")
Error in fread("homimp.csv", quote = "'") : unused argument (quote = "'")

Answer

Summary of the answers given in comments:

Solution #1: Thanks to @Psidom and @jangorecki

Install data.table v. 1.9.7:

install.packages("data.table", type="source", repos="http://Rdatatable.github.io/data.table")

Then run:

homimpdt <- fread("homimp.csv", quote = "\'")

Solution #2 (linux only): thanks to @RichScriven

can be found here: Preventing column-class inference in fread()

and set as.is = TRUE in the type.convert() function