Dominique Dominique - 2 months ago 9
R Question

How do you correctly perform a glmmPQL on non-normal data?

I ran a model using

looking at the effect that Year and Treatment had on the number of points covered with wood, then plotted the residuals to check for normality and the resulting graph is slightly skewed to the right. Is this normally distributed?

model <- glmer(Number~Year*Treatment(1|Year/Treatment), data=data,family=poisson)

enter image description here

This site recommends using
if your data is not normal:


family=gaussian(link = "log"),
na.action = na.pass,verbose=FALSE)

Now do you transform the data in the Excel document or in the R code (
Number1 <- log(Number)
) before running this model? Does the
imply that the data is already log transformed or does it imply that it will transform it?

If you have data with zeros, is it acceptable to add 1 to all observations to make it more than zero in order to log transform it:

sufficient to compare both models?

Many thanks for any advice!


tl;dr your diagnostic plots look OK to me, you can probably proceed to interpret your results.

  • This formula:


might not be quite right (besides the missing + between the terms above ...) In general you shouldn't include the same term in both the random and the fixed effects (although there is one exception - if Year has more than a few values and there are multiple observations per year you can include it as a continuous covariate in the fixed effects and a grouping factor in the random effects - so this might be correct).

  • I'm not crazy about the linked introduction; at a quick skim there's nothing horribly wrong with it, but there seem to b e a lot of minor inaccuracies and confusions. "Use glmmPQL if your data aren't Normal" is really shorthand for "you might want to use a GLMM if your data aren't Normal". Your glmer model should be fine.
  • interpreting diagnostic plots is a bit of an art, but the degree of deviation that you show above doesn't look like a problem.
  • since you don't need to log-transform your data, you don't need to get into the slightly messy issue of how to log-transform data containing zeros. In general log(1+x) transformations for count data are reasonable - but, again, unnecessary here.
  • anova() in this context does a likelihood ratio test, which is a reasonable way to compare models.