mchangun mchangun - 27 days ago 13
R Question

read.csv, header on first line, skip second line

I have a CSV file with two header rows, the first row I want to be the header, but the second row I want to discard. If I do the following command:

data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE)


The first row becomes the header and the second row of the file becomes the first row of my data frame:

Xaaaaaaaaa X X.1 Xbbbbbbbbbb X.2 X.3
1 Date PX_LAST NA Date PX_LAST NA
2 31/12/2002 38.855 NA 31/12/2002 19.547 NA
3 02/01/2003 38.664 NA 02/01/2003 19.547 NA
4 03/01/2003 40.386 NA 03/01/2003 19.547 NA
5 06/01/2003 40.386 NA 06/01/2003 19.609 NA
6 07/01/2003 40.195 NA 07/01/2003 19.609 NA


I want to skip this second row of the CSV file and just get

X1.HK.Equity X X.1 X2.HK.Equity X.2 X.3
2 31/12/2002 38.855 NA 31/12/2002 19.547 NA
3 02/01/2003 38.664 NA 02/01/2003 19.547 NA
4 03/01/2003 40.386 NA 03/01/2003 19.547 NA
5 06/01/2003 40.386 NA 06/01/2003 19.609 NA
6 07/01/2003 40.195 NA 07/01/2003 19.609 NA


I tried
data <- read.csv("HK Stocks bbg.csv", header = T, stringsAsFactors = FALSE, skip = 1)
but that returns:

Date PX_LAST X Date.1 PX_LAST.1 X.1
1 31/12/2002 38.855 NA 31/12/2002 19.547 NA
2 02/01/2003 38.664 NA 02/01/2003 19.547 NA
3 03/01/2003 40.386 NA 03/01/2003 19.547 NA
4 06/01/2003 40.386 NA 06/01/2003 19.609 NA
5 07/01/2003 40.195 NA 07/01/2003 19.609 NA
6 08/01/2003 40.386 NA 08/01/2003 19.547 NA


The header row comes from the second line of my CSV file, not the first line.

Thank you.

Answer

This should do the trick:

all_content = readLines("file.csv")
skip_second = all_content[-2]
dat = read.csv(textConnection(skip_second), header = TRUE, stringsAsFactors = FALSE)

The first step using readLines reads the entire file into a list, where each item in the list represents a line in the file. Next, you discard the second line using the fact that negative indexing in R means select all but this index. Finally, we feed this data to read.csv to process it into a data.frame.