mathkid mathkid - 1 month ago 19
R Question

generating a sparse matrix for a categorical variable

I have a dataframe as follows,

x y
1 a d
2 b e
3 c f


here x and y are categorical variables. I want to generate a sparse matrix with one hot encoding for each of the categorical features namely x and y.

I did the following,

sparse.model.matrix(~.-1,z)
3 x 5 sparse Matrix of class "dgCMatrix"
xa xb xc ye yf
1 1 . . . .
2 . 1 . 1 .
3 . . 1 . 1


I am facing two problems here namely,

1) I need zeros instead of dots and

2) The level d of predictor y is not showing up in the matrix i.e (yd) is not present!!

Can someone please help me here?

Answer

We may need to specify the contrasts.arg

as.matrix(sparse.model.matrix(~.-1, z, contrasts.arg = lapply(z,
          function(x) contrasts(factor(x), contrasts = FALSE))))