mathkid - 8 months ago 77

R Question

I have a dataframe as follows,

`x y`

1 a d

2 b e

3 c f

here x and y are categorical variables. I want to generate a sparse matrix with one hot encoding for each of the categorical features namely x and y.

I did the following,

`sparse.model.matrix(~.-1,z)`

3 x 5 sparse Matrix of class "dgCMatrix"

xa xb xc ye yf

1 1 . . . .

2 . 1 . 1 .

3 . . 1 . 1

I am facing two problems here namely,

1) I need zeros instead of dots and

2) The level d of predictor y is not showing up in the matrix i.e (yd) is not present!!

Can someone please help me here?

Answer

We may need to specify the `contrasts.arg`

```
as.matrix(sparse.model.matrix(~.-1, z, contrasts.arg = lapply(z,
function(x) contrasts(factor(x), contrasts = FALSE))))
```

Source (Stackoverflow)