user2989902 user2989902 - 10 months ago 101
R Question

Complete function in R

I am working on the following. The function works only when one ID is passed. However, when more than one ID is passed (either

for example) the output is not correct.

Write a function that reads a directory full of files and reports the number of completely observed cases in each data file. The function should return a data frame where the first column is the name of the file and the second column is the number of complete cases. This is my code that although it works when only one ID is passed, it does not seem to get the right output when more IDs or a range of IDs is passed. This is a prototype:

complete <- function(directory, id = 1:332) {
## 'directory' is a character of length 1
## indicating the location of the CSV file

## 'id' is an integer vector
## indicating the monitor ID numbers to be used

## Return a data frame of the form:
## id nobs
## 1 117
## 2 1041
## ...
## where 'id' is the monitor ID number and
## 'nobs' is the number of complete cases

And this is my code:

complete <- function(directory, id = 1:332) {
#lists the files in the directory
files_full <- list.files(directory, full.names = TRUE)

#empty data frame were we will store the read from the loop
dat <- data.frame()

nobs = numeric()
for (i in id) {
## binds all the rows of the of the files with "specified" ID
dat <- rbind(dat, read.csv(files_full[i]))

nobs <- sum(complete.cases(dat))
returnVal <- data.frame(id, nobs)

These are the outputs:

complete("specdata", 1)

id nobs
1 1 117

complete("specdata", c(2, 4, 8, 10, 12))

id nobs
1 2 1951
2 4 1951
3 8 1951
4 10 1951
5 12 1951

Can somebody tell me what I am doing wrong?

Answer Source

You are computing nobs from the stacked data frame. 1951 is the sum of complete cases across the IDs. You need to compute and store the count of complete cases for each id separately

nobs = rep(0, length(id))
k <- 1
for (i in id) {
  dat <- read.csv(files_full[i])
  nobs[k] <- sum(complete.cases(dat))
  k <- k + 1
returnVal <- data.frame(id, nobs)