Guy Dubrovski Guy Dubrovski - 10 days ago 6
R Question

R after replacing read.csv with fread incorrect number of dimensions error appears

I was loading my csv file with plain:

baseData <- read.csv(datafile)


but as I want to load larger dataset I have moved to data.table package

baseData <- fread(input = paste("zcat < ", datafile, sep=""))


all seems to work fine, and the data loads much faster, but when I hit the following line:

d <- baseData[baseData$some_prop==0,]
d <- d[!is.na(d[,"col"]) & (d[,"col"] == 0 | d[,"col"] == 1),]


I get error for
incorrect number of dimensions


when using
read.csv
all is working fine.
Any idea what can get wrong ?

Answer

In a data.table the j part of the subsetting is meant to return a new value and the columns names should not be quoted or you'll get back exactly this value.

Example:

>d<-data.table(A=1:5,B=5:10)
> d[,A]
[1] 1 2 3 4 5 1
> d[,B]
[1]  5  6  7  8  9 10
> d[,"B"]
[1] "B"

So for you particular case, removing the quotes around the columns names should fix the error.

If your code is quite long and use data.frame methods, you can use setDF(d) to make it work as-is before refactoring it.

To be complete, the error arise because your logical statement is of length 1 ("col" == whatever does just return one value TRUE or FALSE), not matching the number of rows of your data.table object.

Comments