Pritish Kakodkar Pritish Kakodkar - 3 months ago 6
R Question

Selecting the statistically significant variables in an R glm model

I have an outcome variable, say Y and a list of 100 dimensions that could affect Y (say X1...X100).

After running my

glm
and viewing a summary of my model, I see those variables that are statistically significant. I would like to be able to select those variables and run another model and compare performance. Is there a way I can parse the model summary and select only the ones that are significant?

Answer

You can get access the pvalues of the glm result through the function "summary". The last column of the coefficients matrix is called "Pr(>|t|)" and holds the pvalues of the factors used in the model.

Here's an example:

#x is a 10 x 3 matrix
x = matrix(rnorm(3*10), ncol=3)
y = rnorm(10)
res = glm(y~x)
#ignore the intercept pval
summary(res)$coeff[-1,4] < 0.05