QuestionAnswer QuestionAnswer - 3 months ago 19
R Question

R: How to quickly read large .dta files without RAM Limitations

I have a 10 GB .dta Stata file and I am trying to read it into 64-bit R 3.3.1. I am working on a virtual machine with about 130 GB of RAM (4 TB HD) and the .dta file is about 3 million rows and somewhere between 400 and 800 variables.

I know data.table() is the fastest way to read in .txt and .csv files, but does anyone have a recommendation for reading largeish .dta files into R? Reading the file into Stata as a .dta file requires about 20-30 seconds, although I need to set my working memory max prior to opening the file (I set the max at 100 GB).

I have not tried importing to .csv in Stata, but I hope to avoid touching the file with Stata. A solution is found via Using memisc to import stata .dta file into R but this assumes RAM is scarce. In my case, I should have sufficient RAM to work with the file.

Answer

The fastest way to load a large Stata dataset in R is using the readstata13 package. I have compared the performance of foreign, readstata13, and haven packages on a large dataset in this post and the results repeatedly showed that readstata13 is the fastest available package for reading Stata dataset in R.

Comments