Arndt Arndt - 4 months ago 21
R Question

When importing CSV into R how to generate column with name of the CSV?

I have a large number of csv files that I want to read into R. All the Column headings in the csvs are the same. At first I thought I would need to create a loop based on the list of file names, but after searching I found a faster way. This reads in and combines all the csvs correctly (as far as i know).

filenames <- list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, = FALSE)

import.list <- llply(filenames, read.csv)

combined <-"rbind", import.list)

The only problem is that I want to know which csv a specific row of data comes from. I want a column labeled 'source' that contains the name of the csv that the particular row came from. so for example if the csv was called Chicago_IL.csv when the data got into R the row would look something like this:

> City State Market etc Source
> Burbank IL Western etc Chicago_IL


You have already done all the hard work. With a fairly small modification this should be straight-forward.

The logic is:

  1. Create a small helper function that reads an individual csv and adds a column with the file name.
  2. Call this helper function in llply()

The following should work:

read_csv_filename <- function(filename){
    ret <- read.csv(filename)
    ret$Source <- filename #EDIT

import.list <- ldply(filenames, read_csv_filename)

Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().