FaridCher FaridCher - 1 month ago 7x
R Question

matrix subseting by column's name using `subset` function

Consider the following simulation snippet:

k <- 1:5
x <- seq(0,10,length.out = 100)
dsts <- lapply(1:length(k), function(i) cbind(x=x, distri=dchisq(x,k[i]),i) )
dsts <- do.call(rbind,dsts)

why does this code throws an error (dsts is matrix):

#Error in subset.matrix(dsts, i == 1) : object 'i' not found

Even this one:

colnames(dsts)[3] <- 'iii'

But not this one (matrix coerced as dataframe):


This one works either where
is already defined:

subset(dsts,x> 500)

The error occurs in
on this line:

else if (!is.logical(subset))

Is this a bug that should be reported to R Core?


The behavior you are describing is by design and is documented on the ?subset help page.

From the help page:

For data frames, the subset argument works on the rows. Note that subset will be evaluated in the data frame, so columns can be referred to (by name) as variables in the expression (see the examples).

In R, data.frames and matrices are very different types of objects. If this is causing a problem, you are probably using the wrong data structure for your data. Matrices are really only necessary if you meed matrix arithmetic. If you are thinking of your columns as different attributes for a row observations, then you should be storing your data in a data.frame in the first place. You could store all your values in a simple vector where every three values represent one observation, but that would also be a poor choice of data structure for your data. I'm not sure if you were trying to be more efficient by choosing a matrix but it seems like just the wrong choice.

A data.frame is stored as a named list while a matrix is stored as a dimensioned vector. A list can be used as an environment which makes it easy to evaluate variable names in that context. The biggest difference between the two is that data.frames can hold columns of different classes (numerics, characters, dates) while matrices can only hold values of exactly one data.type. You cannot always easily convert between the two without a loss of information.

Thinks like $ only work with data.frames as well.

dd <- data.frame(x=1:10)
mm <- matrix(1:10, ncol=1, dimnames=list(NULL, "x"))    
mm$x # Error

If you want to subset a matrix, you are better off using standard [ subsetting rather than the sub setting function.

dsts[ dsts[,"i"]==1, ]

This behavior has been a part of R for a very long time. Any changes to this behavior is likely to introduce breaking changes to existing code that relies on variables being evaluated in a certain context. I think the problem lies with whomever told you to use a matrix in the first place. Rather than cbind(), you should have used data.frame()