Morten Nielsen Morten Nielsen - 2 months ago 14
R Question

Reading large csv file in R

I have a number of csv-files of different size, but all somewhat big. Using

read.csv
to read them into R takes longer than I've been patient to wait so far (several hours). I managed to read the biggest file (2.6 gb) very fast (less than a minute) with
data.table
's
fread
.

My problem occurs when I try to read a file of half the size. I get the following error message:


Error in
fread("C:/Users/Jesper/OneDrive/UdbudsVagten/BBR/CO11700T.csv"
,:

Expecting 21 cols, but line 2557 contains text after processing all
cols. It is very likely that this is due to one or more fields having
embedded
sep=';'
and/or (unescaped)
'\n'
characters within unbalanced
unescaped quotes.

fread
cannot handle such ambiguous cases and those
lines may not have been read in as expected. Please read the section
on quotes in ?
fread
.


Through research I've found suggestions to add
quote = ""
to the code, but it doesn't help me. I've tried using the
bigmemory
package, but R crashes when I try. I'm on a 64 bit system with 8 gb of ram.

I know there are quite a few threads on this subject, but I haven't been able to solve the problem with any of the solutions. I would really like to use
fread
(given my good experience with the bigger file), and it seems like there should be some way to make it work - just can't figure it out.

Answer Source

Solved this by installing SlickEdit and using it to edit the lines that caused the trouble. A few characters like ampersand, quotation marks, and apostrophes were consistently encoded to include semicolon - e.g. & instead of just &. As semicolon was the seperator in the text document, this caused the problem in reading with fread.