DnstnRmsy DnstnRmsy - 2 months ago 39
R Question

For loop returns last result

I have a small number of csv files, each containing two columns with numeric values. I want to write a for loop that reads the files, sums the columns, and stores the sum totals for each csv in a numeric vector. This is the closest I've come:

allfiles <- list.files()
for (i in seq(allfiles)) {
total <- numeric()
total[i] <- sum(subset(read.csv(allfiles[i]), select=Gift.1), subset(read.csv(allfiles[i]), select=Gift.2))

My result is all NA's save a value for the last file. I understand that I'm overwriting each iteration each time the for loop executes and I think* I need to do something with indexing.

Answer Source

The first problem is that you are not pre-allocating the right length of (or properly appending to) total. Regardless, I recommend against that method.

There are several ways to do this, but the R-onic (my term, based on pythonic ... I know, it doesn't flow well) is based on vectors/lists.

alldata <- sapply(allfiles, read.csv, simplify = FALSE)
totals <- sapply(alldata, function(a) sum(subset(a, select=Gift.1), subset(a, select=Gift.2)))

I often like to that, keeping the "raw/unaltered" data in one list and then repeatedly extract from it. For instance, if the files are huge and reading them is a non-trivial amount of time, then if you realize you also need Gift.3 and did it your way, then you'd need to re-read the entire dataset. Using my method, however, you just update the second sapply to include the change and rerun on the already-loaded data. (Most of the my rationale is based on untrusted data, portions that are typically unused, or other factors that may not be there for you.)

If you really wanted to reduce the code to a single line, something like:

totals <- sapply(allfiles, function(fn) {
  x <- read.csv(fn)
  sum(subset(x, select=Gift.1), subset(x, select=Gift.2))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download