I want to use data.table to process a very big file.
It doesn't fit on memory.
I've thought on reading the file on chunks using a loop with (increasing properly the skip parameter).
fread("myfile.csv", skip=loopindex, nrows=chunksize)
1) count.fields Not sure if
count.fields reads the whole file into R at once. Try it to see if it works.
length(count.fields("myfile.csv", sep = ","))
If the file has a header subtract one from the above.
2) sqldf Another possibility is:
library(sqldf) read.csv.sql("myfile.csv", sep = ",", sql = "select count(*) from file")
You may need other arguments as well depending on header, etc. Note that this does not read the file into R.
3) wc Use the system command wc which should be available on all platforms that R runs on.
shell("wc -l myfile.csv")
Again, if there is a header subtract one.
If you are on Windows be sure that Rtools is installed and:
shell("C:\\Rtools\\bin\\wc -l myfile.csv")
Alternately on Windows without Rtools try this:
shell('find /v /c "" myfile.csv')