mathkid mathkid - 3 years ago 242
R Question

generating a sparse matrix for a categorical variable

I have a dataframe as follows,

x y
1 a d
2 b e
3 c f

here x and y are categorical variables. I want to generate a sparse matrix with one hot encoding for each of the categorical features namely x and y.

I did the following,

3 x 5 sparse Matrix of class "dgCMatrix"
xa xb xc ye yf
1 1 . . . .
2 . 1 . 1 .
3 . . 1 . 1

I am facing two problems here namely,

1) I need zeros instead of dots and

2) The level d of predictor y is not showing up in the matrix i.e (yd) is not present!!

Can someone please help me here?

Answer Source

We may need to specify the contrasts.arg

as.matrix(sparse.model.matrix(~.-1, z, contrasts.arg = lapply(z,
          function(x) contrasts(factor(x), contrasts = FALSE))))
