Amstell Amstell - 1 month ago 19
R Question

Merge multiple *.bil climate data into *.csv

I have over 7,000 files *.bil files I'm trying to merge into one *.csv file and export it. I am able to read the *.bil files using raster and as.data.frame :

setwd("/.../Prism Weather Data All/")
filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")
r = raster("PRISM_ppt_stable_4kmM2_189501_bil.bil")
test <- as.data.frame(r, na.rm=TRUE)


This sets the working directory and grabs all files with *.bil. But I only raster one file and set as.data.frame to verify it's correct, which works perfect. But I'm trying to figure out how to merge all 7000 files (filenames) into one.

Any help with this would be appreciated. Thanks in advance.

Answer

Assuming 7000 is a real number an not an approximation, and that all the data in each file is identically structured (same number of columns and rows):

setwd("/.../Prism Weather Data All/")

nc<- ## put the number of columns of each file (assuming they're all the same)
nr<- ## put the number of rows of each file (assuming they're all the same)

filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")

# initialize what is likely to be a large object
final.df<-as.data.frame(matrix(NA,ncol=7000*nc,nrow=nr)) 
counter=1
# loop through the files
for (i in filenames){
    r = raster(i)
    test <- as.data.frame(r, na.rm=TRUE)
    final.df[,counter:counter+nc]<-test
    counter<-counter+nc+1
}

# write the csv
write.csv(final.df,"final-filename.csv")

Keep in mind that your machine has to have enough memory to keep all the data, as R needs to have objects in memory.

If the number of columns differs from file to file, you can adjust that by adjusting the indexes in the final.df assignment inside the loop, and increasing counter accordingly.


Edit: to produce expected result

I think a for loop is about the only way to do this sort of job. And indeed 7000 files is quite a large set, so expect to spend some time seeing it iterate.

setwd("/.../Prism Weather Data All/")

nc<- ## put the number of columns you expect the data in the files to have
nr<- ## put roughly the number of rows times 12 (if you plan to read a year worth of data)
     ## PLUS some tolerance, so you'll end up with an object actually larger than needed

filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")

# initialize what is likely to be a large object
final.df<-as.data.frame(matrix(NA,ncol=c,nrow=nr)) 
counter=1
# loop through the files
for (i in filenames){
    r = raster(i)
    test <- as.data.frame(r, na.rm=TRUE)
    numrow2<-nrow(test)
    final.df[counter:counter+numrow2,]<-test
    counter<-counter+numrow2+1
}

final.df[counter-1:nrow(final.df),]<-NULL  ## remove empty rows

# write the csv
write.csv(final.df,"final-filename.csv")

Hope it helps.