Arndt Arndt - 2 months ago 9
R Question

When importing CSV into R how to generate column with name of the CSV?

I have a large number of csv files that I want to read into R. All the Column headings in the csvs are the same. At first I thought I would need to create a loop based on the list of file names, but after searching I found a faster way. This reads in and combines all the csvs correctly (as far as i know).

filenames <- list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE)

library(plyr)
import.list <- llply(filenames, read.csv)

combined <- do.call("rbind", import.list)


The only problem is that I want to know which csv a specific row of data comes from. I want a column labeled 'source' that contains the name of the csv that the particular row came from. so for example if the csv was called Chicago_IL.csv when the data got into R the row would look something like this:

> City State Market etc Source
> Burbank IL Western etc Chicago_IL

Answer

You have already done all the hard work. With a fairly small modification this should be straight-forward.

The logic is:

  1. Create a small helper function that reads an individual csv and adds a column with the file name.
  2. Call this helper function in llply()

The following should work:

read_csv_filename <- function(filename){
    ret <- read.csv(filename)
    ret$Source <- filename #EDIT
    ret
}

import.list <- ldply(filenames, read_csv_filename)

Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().