which_command which_command - 11 days ago 5
R Question

R import multiple files and perform complex function on them

I have a series of files that when I read them and manipulate them individually I don't have have problem.

They are organised as follows:

e.g.



chrY<-read.table('chrY.txt', sep ='', header=F)
head(chrY)
V1
1 4.514563
2 4.543689
3 4.553398
4 4.533981
5 4.495146
6 4.514563


I need to convert each of the values to numeric:

And so I try this for a list of chromosome files:

temp = list.files(pattern="chr*.txt")
for (i in 1:length(temp)) assign(temp[i], read.table(temp[i], sep ='', header=F))
> temp
[1] "chr17.txt" "chr18.txt"
[3] "chr19.txt" "chr1.txt"
[5] "chr6.txt" "chrY.txt"


Conversion to numerics:

for(i in temp){
temp[i]<-as.numeric(temp[i])
}


I wanted to plot over the average of all the files and plot over them,

Plotting one imported file is fine:

plot(chrY.txt[,1])


my attempt at plotting the average values between all of them as follows:

for(i in length(temp)-1){ #index -1 such that iteration is not out of range
x<-(temp[i][,1]+temp[i+1][,1])/length(temp)
}
plot(x)


however for the averaging-process I get the following error:


Error in temp[i][, 1] : incorrect number of dimensions


Is there a faster way of doing this than a for loop in R? This is a practice run and I will be potentially importing a lot of files to average over

Answer

We can read the files in a list and then use Reduce with + and divide by the length of the list to get the mean value

temp <- list.files(pattern="chr*.txt")
lst <- lapply(temp, read.table, header=FALSE)
Reduce('+', lst)/length(lst)