Henri W Henri W - 1 month ago 7
R Question

Running functions on all files in a directory and extracting with file name preserved in R

In R, I wish to extract each

csv
file in my directory, one at a time, as data frames and perform some simple cross-column calculations, then export the resulting dataframe as a
csv
while preserving part of the original file name.

For example in
Path/To/Directory
I have the following 4 files:

Prot1-Combined_Scores.csv
Prot2-Combined_Scores.csv
Prot3-Combined_Scores.csv
Prot4-Combined_Scores.csv


Each file has a dataframe that looks something like this:

V1 V2 V3 V4 V5 V6 V7
1 CHEM001 0.000 0 0 0.684255 0.91599 0.671794
2 CHEM002 0.048 4 1 0 0.953549 0.691595
3 CHEM003 0.287 1 0 0.011915 0.970648 0.854309
4 CHEM004 0.298 0 2 0.136784 0.984207 0.86979
5 CHEM005 0.000 1 0 0.578534 0.995675 0.695794


I want to make a column
V8
that, for example, calculates
(V2+V3+V6+V7)^2 + 2*V4 + V5/3
.

Finally I would like to save the final dataframe as a
csv
file with a name that preserves the
Prot1
part of the original filename, such as
Prot1-Final_Score.csv
, and the same for
Prot2
,
Prot3
, and so on.

I am new to R and I have read that
lapply
is useful for running functions on every file in a directory, but I particularly need help to integrate the calculations I mentioned into
lapply
and also to extract the necessary string from the filename to export later.

Answer

Hope this helps! please also share your efforts or your approach so that you can learn better!

path="Path/To/Directory/"
x=list.files(path = path, pattern = ".csv")
final_pathname = paste0(path, x)

L=lapply(seq_along(final_pathname), abc)

abc <- function(i){
  df = read.csv(final_pathname[i])
  df$V8 = (df$V2+df$V3+df$V6+df$V7)^2 + 2*df$V4 + df$V5/3)
  write.csv(df,file = paste0(path,paste0(unlist(strsplit(x[i],".csv"))
                                     ,"-Final_copy"),".csv"))
}