Alexander Alexander - 4 days ago 7
R Question

Modify filenames in a list of files and add them as a new column

I have an issue about adding new column for each element of list which will contain modified name of these files. So far, I am able to

substr
for creating filenames column but on the other hand I was not able to add new strings to this column such as "best" or "worst" words.

Here is my reproducible attempt,

This part for only producing .txt files to the working directory!

writeFiles <- function(n, maxRows=10){
lapply(seq(10,90,10),function(x) write.table(sample(sample(maxRows)[1],replace=F),paste(x,'.txt',sep=""), quote=FALSE, col.names = FALSE,row.names=FALSE))
}
writeFiles(9,10)

filesToProcess <- dir(pattern = "*\\.txt")

"10.txt" "20.txt" "30.txt" "40.txt" "50.txt" "60.txt" "70.txt" "80.txt" "90.txt"


In the next step I will read this files and modify filenames column that only takes first character of the .txt files.

data.list <- lapply(filesToProcess,function(x){
tmp <- read.table(file=x, header = F,fill=T, comment.char='*')
# tmp$filenames <- paste0(substr(x,1,1),c("best","worst"),sep="")
tmp$filenames <- substr(x,1,1)

return(tmp)
})

data.list
[[1]]
V1 filenames
1 4 1
2 3 1
3 7 1
4 8 1
5 1 1
6 2 1
7 6 1
8 5 1

[[2]]
V1 filenames
1 4 2
2 1 2
3 5 2
4 3 2
5 2 2
6 6 2
7 7 2

[[3]]
V1 filenames
1 1 3
2 3 3
3 2 3


etc.

Indeed, I also want to new character strings to
filenames
column and I tried paste command inside of
lapply
,

data.list <- lapply(filesToProcess,function(x){
tmp <- read.table(file=x, header = F,fill=T, comment.char='*')
tmp$filenames <- paste0(rep(c("best","worst"),c(4,5)),substr(x,1,1),sep="")
return(tmp)
})


Error in
$<-.data.frame
(
*tmp*
, "filenames", value = c("best1", #"best1", : replacement has 9 rows, data has 8



So the first 4 .txt files are marked as best and remaining 5 files are the worst .txt files.

How can do that inside of
lapply
?

Answer

We can subset first four elements of 'data.list', loop through them, and transform the 'filenames' column.

data.list[1:4] <- lapply(data.list[1:4], transform, filenames= paste0("best", filenames))

Similarly, the same can be done with the remaining 5 list elements

data.list[5:9] <- lapply(data.list[5:9], transform, filenames= paste0("worst", filenames))

We can also do this from 'filesToProcess' by creating a vector of 'best', 'worst' and then loop through the sequence of 'filesToProcess'

v1 <- rep(c("best", "worst"), c(4, 5))
lapply(seq_along(filesToProcess),function(i){
    tmp <- read.table(file=filesToProcess[i], header = FALSE, fill=TRUE, comment.char='*')
    tmp$filenames <- paste0(v1[i], substr(filesToProcess[i],1,1), sep="")
    tmp
  })
Comments