I am relatively new in the "large data process" in r here, hope to look for some advise about how to deal with 50 GB csv file. The current problem is following:
Table is looked like:
ID,Address,City,States,... (50 more fields of characteristics of a house)
# the first 1 is caused by write.csv, they created an index raw in the file
all <- read.csv.ffdf(
file="<path of large file>",
sep = ",",
Error in ff(initdata = initdata, length = length, levels = levels, ordered = ordered,
: vmode 'character' not implemented
You can use R with SQLite behind the curtains with the sqldf package. You'd use the
read.csv.sql function in the
sqldf package and then you can query the data however you want to obtain the smaller data frame.
The example from the docs:
library(sqldf) iris2 <- read.csv.sql("iris.csv", sql = "select * from file where Species = 'setosa' ")
I've used this library on VERY large CSV files with good results.