Joe Joe - 23 days ago 12
R Question

How to skip second line is csv file while maintaining first line as column names with read_csv?

Qualtrics generates csv files with variable names in the first line and variable labels in the second line. I'd like to use read_csv() to read in my data while reading the first line as column names and then skipping the next line of variable labels. Below is my failed attempt.

library(readr)
mydata <- read_csv("qualtrics_data.csv", col_names = TRUE, skip = 2) # this would actually skip both the names and label rows.

Answer Source

You can just read in twice - once to get the names, and then to get the data.

library(readr)
library(dplyr)

csv_file <- "mpg,cyl,disp,hp,drat,wt
mpg,cyl,disp,hp,drat,wt
21.0,6,160,110,3.90,2.875
22.8,4,108,93,3.85,2.320
21.4,6,258,110,3.08,3.215
18.7,8,360,175,3.15,3.440
18.1,6,225,105,2.76,3.460"


df_names <- read_csv(csv_file, n_max = 0) %>% names()

df_names
#> [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"

df <- read_csv(csv_file, col_names = df_names, skip = 2)

df

#> # A tibble: 5 x 6
#>     mpg   cyl  disp    hp  drat    wt
#>   <dbl> <int> <int> <int> <dbl> <dbl>
#> 1  21.0     6   160   110  3.90 2.875
#> 2  22.8     4   108    93  3.85 2.320
#> 3  21.4     6   258   110  3.08 3.215
#> 4  18.7     8   360   175  3.15 3.440
#> 5  18.1     6   225   105  2.76 3.460