Mohammad Osman Mohammad Osman - 1 month ago 7
R Question

Importing a similar column (eg. pH column) from 60 csv files thru use of loop and making a matrix .

I want to create a loop that reads a specific column (Lets say pH) in a directory of 60 csv files and write that column in csv file next to each other as the order of files in the directory.

directory <- "C:/"
file_name <- list.files(directory, pattern = ".csv")
files.to.read <- paste(directory, file_name, sep="/")
for (i in 1:length(files.to.read)) {
dta <- read.csv(files.to.read[i], header=TRUE)
x<-dta$pH_
}


How can I code this to give me a csv file with 60 columns of x next to each other. Note that the number of rows are not necessary equal in all the files so when I cbind them, it gives me an error. Thank you in advance.

Answer
directory <- "C:/"
file_name <- list.files(directory, pattern = ".csv")
files.to.read <- paste(directory,  file_name, sep="/")
x<-list()

I'd suggest storing your data from your CSVs in a list:

for (i in 1:length(files.to.read)) {
  dta <- read.csv(files.to.read[i], header=TRUE)
  x[[i]]<-dta$pH
}

Make a null variable to store the lengths of each pH dataset

ListLength<-NULL


for (i in 1:length(x)){
  ListLength<-c(ListLength, length(x[[i]]))
}

Then make another variable to determine how short each column will be from the longest pH dataset. Call this ManyNAs

ManyNAs<-max(ListLength)-ListLength

Make a null dataframe that has as many rows as your longest pH dataset.

OutputDF<-data.frame(matrix(NA, nrow = max(ListLength), ncol = length(file_name)))

Then add in additional NAs to make dataframe happy.

for (i in 1:length(x)){
  OutputDF[,i]<-c(x[[i]],rep(NA, ManyNAs[i]))

}

write.csv(OutputDF, "OutputDF.csv")

If you really don't want NAs in your final CSV file you can do this:

for (i in 1:length(x)){
   OutputDF[,i]<-c(x[[i]],rep("",ManyNAs[i]))
}