john1607 john1607 - 3 months ago 19
R Question

For Loop Only Showing Last Record

I'm learning how to write R functions that reads a directory full of files and reports the number of completely observed cases in each data file.

My function works with one case, but with multiple cases the loop only shows the last record.

complete <- function(directory, id = 1:332) {
files_list <- list.files(path = directory, full.names = TRUE)
dat <- data.frame()
for (i in id) {
dat <- rbind(dat, read.csv(files_list[i]))
}
nobs <- sum(complete.cases(dat))
id <- i
data.frame(id, nobs)
}


My expected result when running

> complete("specdata", 1:6)

## id nobs
## 1 1 932
## 2 2 711
## 3 3 475
## 4 4 338
## 5 5 586
## 6 6 463


Instead when id = 1:6, it returning a data.frame with ten results, it returns:

> complete("Specdata", 1:6)


id nobs
1 6 3562


I suspect the problem is that the function is replacing the values each time as it loops through. I've searched SO and elsewhere for help with "only showing last record" problems and cannot figure out a solution from those other answers.

Thank you in advance for any help. I'm brand new to R as I'm sure is abundantly obvious.

Answer

Hope this should work!

complete <- function(directory, id = 1:332) {
        files_list <- list.files(path = directory, full.names = TRUE)
        dat <- data.frame()
        tmp <- data.frame()
                for (i in id) {
                        dat <- rbind(dat, read.csv(files_list[i]))
                        nobs <- sum(complete.cases(dat))
                        id <- i
                        tmp <- rbind(tmp,data.frame(id,nobs))
                }


        tmp
}

Details:

It's primarily because you're returning the data frame after the for loop hence the most recent value of i is set and the sum of all is there. You must rbind at every iteration of for and return the final data frame