Dominique - 3 months ago 13
R Question

# How do you correctly perform a glmmPQL on non-normal data?

I ran a model using

`glmer`
looking at the effect that Year and Treatment had on the number of points covered with wood, then plotted the residuals to check for normality and the resulting graph is slightly skewed to the right. Is this normally distributed?

``````model <- glmer(Number~Year*Treatment(1|Year/Treatment), data=data,family=poisson)
``````

This site recommends using
`glmmPQL`

``````library(MASS)
library(nlme)

model1<-glmmPQL(Number~Year*Treatment,~1|Year/Treatment,
data=data,start=coef(lm(Log~Year*Treatment)),
na.action = na.pass,verbose=FALSE)
summary(model1)
plot(model1)
``````

Now do you transform the data in the Excel document or in the R code (
`Number1 <- log(Number)`
) before running this model? Does the
`link="log"`
imply that the data is already log transformed or does it imply that it will transform it?

If you have data with zeros, is it acceptable to add 1 to all observations to make it more than zero in order to log transform it:
`Number1<-log(Number+1)`
?

Is
`fit<-anova(model,model1,test="Chisq")`
sufficient to compare both models?

tl;dr your diagnostic plots look OK to me, you can probably proceed to interpret your results.

• This formula:

``````Number~Year*Treatment+(1|Year/Treatment)
``````

might not be quite right (besides the missing `+` between the terms above ...) In general you shouldn't include the same term in both the random and the fixed effects (although there is one exception - if `Year` has more than a few values and there are multiple observations per year you can include it as a continuous covariate in the fixed effects and a grouping factor in the random effects - so this might be correct).

• I'm not crazy about the linked introduction; at a quick skim there's nothing horribly wrong with it, but there seem to b e a lot of minor inaccuracies and confusions. "Use `glmmPQL` if your data aren't Normal" is really shorthand for "you might want to use a GLMM if your data aren't Normal". Your `glmer` model should be fine.
• interpreting diagnostic plots is a bit of an art, but the degree of deviation that you show above doesn't look like a problem.
• since you don't need to log-transform your data, you don't need to get into the slightly messy issue of how to log-transform data containing zeros. In general `log(1+x)` transformations for count data are reasonable - but, again, unnecessary here.
• `anova()` in this context does a likelihood ratio test, which is a reasonable way to compare models.