Mathias Burton Mathias Burton - 1 month ago 8
R Question

Struggling to build a merged data frame to analysis on

I am lost trying to take a folder of csv files and merge them into a single data frame. The folders are numbered 1 to 332.csv in a folder (which is currently my working directory).

What I am trying to accomplish is a data frame I can extract the mean of a column of complete cases and a count of complete cases.

Here's where my code currently stands

# List a set of the files
fileList = list.files(pattern="*.csv")

# Make data frame for each file
df = c(rep(data.frame(), length(fileList)))

# Read csv files into data frames
for (i in 1:length(fileList)) { df[[i]] <- as.list(read.csv(fileList[i])) }

#merge data frames into a single data frame
fullFrame <- rbind(df[[i]])

#isolate to just complete cases
completeFrame <- complete.cases(fullFrame)

fullFrame[completeFrame]


my expectation was to have a large table-like view of all the cases together, na cases not present.

Instead it outputs

> fullFrame[completeFrame]

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

[[7]]
NULL

[[8]]
NULL

[[9]]
NULL

[[10]]
NULL

[[11]]
NULL

[[12]]
NULL

[[13]]
NULL

[[14]]
NULL

[[15]]
NULL

[[16]]
NULL

Answer

Even though you want a data.frame, data.table offers extremely fast and stupid-proof functions for dealing with this exact problem:

library(data.table)

fileList <- list.files(pattern="*.csv")
listing <- lapply(fileList, fread)
dt <- rbindlist(listing) # if unequal columns add ,fill = TRUE
dt <- na.omit(dt)
df <- as.data.frame(dt)
Comments