I am trying to read a large csv file into R. Even though the file is large, I only want to work with some of the rows that fulfil a particular condition (e.g. Variable2 >= 3). This is a much smaller dataset. I would like to read these lines directly into a dataframe rather than load the whole dataset into a dataframe and then select according to the condition. The main reason being that the dataset does not easily fit into the memory of a desktop or laptop. I am looking for a solution that uses only R, and does not require python or other languages. Thanks.
You could use the
read.csv.sql function in the
sqldf package and filter using SQL select. From the help page of
library(sqldf) write.csv(iris, "iris.csv", quote = FALSE, row.names = FALSE) iris2 <- read.csv.sql("iris.csv", sql = "select * from file where Sepal.Length > 5", eol = "\n")