Mohit Mohit - 3 months ago 7
R Question

Passing Array in pdftools library of R

I am trying to convert multiple pdf files into excel version so that through vba I can manipulate the text and find some specific figures. The code that I have written is:

library("pdftools")
setwd("C:/Users/mohit.bansal/Desktop/CSL")
filenames <- list.files(pattern = "*.pdf", all.files = TRUE )
filenames
txt <- pdf_text(filenames[1])
write.table(txt, file = paste(filenames[1], ".xls", sep = ""), sep = " ")
txt <- pdf_text(filenames[2])
write.table(txt, file = paste(filenames[2], ".xls", sep = ""), sep = " ")
txt <- pdf_text(filenames[3])
write.table(txt, file = paste(filenames[3], ".xls", sep = ""), sep = " ")


Here I pass all the pdf file names into array name filenames and then I pass the filenames one by one to convert them into excel. What I want is to be independent of the last repetitive code lines. Suppose I have 25 files in a folder I need to write those lines 25 times. I there any code line which can pass all the names at once.

Answer
library(pdftools)

setwd("C:/Users/mohit.bansal/Desktop/CSL")

filenames <- list.files(pattern = "*.pdf", all.files = TRUE )

for (fname in filenames) {
  txt <- pdf_text(fname)
  write.table(txt, file = paste(fname, ".xls", sep = ""), sep = " ")
}

But, help("for") in the console would have provided sufficient information on how to use a for loop.

The "problem" with using the *apply family of functions for this is that there's a side-effect of dumping a result back into the environment (even though only temporarily). Even purrrr::walk() returns data back, but at least it does so invisibly (and returns the original data unmodified).

Comments