Gerd Marvin - 1 year ago 57

R Question

Assume I have a dataset containing two categorical predictor variables (a,b) and a binary target (y) variable.

`> df <- data.frame(`

> a = factor(c("cat1","cat2","cat3","cat1","cat2")),

> b = factor(c("cat1","cat1","cat3","cat2","cat2")),

> y = factor(c(T,F,T,F,T))

> )

The following logical relations exist in the data:

`if (a = cat3) then (b = cat3 and y = true)`

else if (a = b) then (y = true) else y = false

I want to use

`glm`

`glm`

`alias`

However it can happen, as for the dataset above, that a linear relationship exists between one reference code generated for variable a and one reference code of variable b.

See the output of my model:

`> model <- glm(y ~ ., family=binomial(link='logit'), data=df)`

> summary(model)

...

Coefficients: (1 not defined because of singularities)

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.965e-16 1.732e+00 0.000 1.000

acat2 -2.396e-16 2.000e+00 0.000 1.000

acat3 1.857e+01 6.523e+03 0.003 0.998

bcat2 0.000e+00 2.000e+00 0.000 1.000

bcat3 NA NA NA NA # <- get rid of this?

How should I handle this case?

Is there a way to tell glm to omit some of the generated reference codes?

In the real problem my

`"cat3"`

`NA`

`NA`

The checked answer solves the question, however, in this specific case the singularities can simply be ignored as pointed out in the comments.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

You could run it twice removing the redundant model matrix columns on the second run:

```
model <- glm(y ~ ., family=binomial(link='logit'), data=df) # as in question
mm <- model.matrix(model)[, !is.na(coef(model)) ]
df0 <- data.frame(y = df$y, mm[, -1])
update(model, data = df0)
```

giving:

```
Call: glm(formula = y ~ ., family = binomial(link = "logit"), data = df0)
Coefficients:
(Intercept) acat2 acat3 bcat2
1.965e-16 -2.396e-16 1.857e+01 0.000e+00
Degrees of Freedom: 4 Total (i.e. Null); 1 Residual
Null Deviance: 6.73
Residual Deviance: 5.545 AIC: 13.55
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**