MichaelChirico MichaelChirico - 1 month ago 5
R Question

drop in fread: misses repetitions of col name (data.table R)

I've got a file with a bunch of filler columns (named, of course,

filler
) that I'm trying to read with
fread
.

I'm using the
drop
argument, but it only drops the first (presumably left-right, but this is irrelevant) instance it encounters; I want it to get rid of all of these.

Quick example:

header of
.csv
:

id,first_name,last_name,filler,birth_year,filler,position,filler,wage


names(dt)
from using
drop
in
fread
:

id,first_name,last_name,birth_year,filler,position,filler,wage


Further, if I just try:

DT <- fread("file.csv", drop = rep("filler", 5L))


I get an error:


Error in
fread(paste0(substr(tt, 3, 4), "staff.csv"), drop = rep("filler",
:
Duplicates detected in drop


Any pointers?

Answer

You could read the first line of the file with scan(), and then use that data as the drop indices in fread()

## example text for fread()
x <- "id,first_name,last_name,filler,birth_year,filler,position,filler,wage
1,2,3,4,5,6,7,8,9"
## read the first line and find the filler
f <- scan(text = x, what = "", sep = ",", nlines = 1) == "filler"
## pass to fread()
fread(x, drop = which(f))
#    id first_name last_name birth_year position wage
# 1:  1          2         3          5        7    9
Comments