coulminer coulminer - 23 days ago 10
R Question

Printing all glm coefficients in R

How do I print glm coefficients for all factor levels, including reference level?
summary(glm_obj) prints only the values that deviate from reference values.

I know that those are 0's, but I need this for integration, i.e. telling other systems what factor levels can happen at all.

Sorry if it's too simple, could not find anywhere.

Thanks

To illustrate the problem I am facing:

> glm(Petal.Width~Species,data=iris)

Call: glm(formula = Petal.Width ~ Species, data = iris)

Coefficients:
(Intercept) Speciesversicolor Speciesvirginica
0.246 1.080 1.780

Degrees of Freedom: 149 Total (i.e. Null); 147 Residual
Null Deviance: 86.57
Residual Deviance: 6.157 AIC: -45.29`


The model description above yields only coefficients for versicolor and virginica, which is, as Dason has noted, absolutely fine from the point of view of the model itself.

However, I needed to share the model with another application, which must know what levels of Species to expect (and e.g. issue a warning in once a new, unstudied level appears).

Summary() gives the same results:

> summary(glm(Petal.Width~Species,data=iris))

Call:
glm(formula = Petal.Width ~ Species, data = iris)

Deviance Residuals:
Min 1Q Median 3Q Max
-0.626 -0.126 -0.026 0.154 0.474

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.24600 0.02894 8.50 1.96e-14 ***
Speciesversicolor 1.08000 0.04093 26.39 < 2e-16 ***
Speciesvirginica 1.78000 0.04093 43.49 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.04188163)

Null deviance: 86.5699 on 149 degrees of freedom
Residual deviance: 6.1566 on 147 degrees of freedom
AIC: -45.285

Number of Fisher Scoring iterations: 2

Answer

Answered to myself as I think this method fits the purpose better than what has been proposed.

Sharing predictive models is not what the summary.glm method supposed to do, therefore summary(model) does not say much about the data the model is applied to.

There is a solution though -- use PMML, it allows describing both, the model, and the data it should be applied to.

Example:

> library(pmml)
> pmml(glm(Petal.Width~Species,data=iris))
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
 <Header copyright="Copyright (c) 2014 dmitrijsl" description="Generalized Linear     Regression Model">
  <Extension name="user" value="dmitrijsl" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.4"/>
  <Timestamp>2014-07-15 15:07:51</Timestamp>
 </Header>
 <DataDictionary numberOfFields="2">
  <DataField name="Petal.Width" optype="continuous" dataType="double"/>
  <DataField name="Species" optype="categorical" dataType="string">
   <Value value="setosa"/>
   <Value value="versicolor"/>
   <Value value="virginica"/>
  </DataField>
...

Now also Setosa is in the list for the receiver system to know what to expect, and the model description is there:

...
<ParameterList>
 <Parameter name="p0" label="(Intercept)"/>
 <Parameter name="p1" label="Speciesversicolor"/>
 <Parameter name="p2" label="Speciesvirginica"/>
</ParameterList>
<FactorList>
 <Predictor name="Species"/>
</FactorList>
<CovariateList/>
<PPMatrix>
 <PPCell value="versicolor" predictorName="Species" parameterName="p1"/>
 <PPCell value="virginica" predictorName="Species" parameterName="p2"/>
</PPMatrix>
<ParamMatrix>
 <PCell parameterName="p0" df="1" beta="0.245999999999997"/>
 <PCell parameterName="p1" df="1" beta="1.08"/>
 <PCell parameterName="p2" df="1" beta="1.78"/>
</ParamMatrix>

Comments