Dominique Dominique - 3 months ago 13
R Question

How do you correctly perform a glmmPQL on non-normal data?

I ran a model using

glmer
looking at the effect that Year and Treatment had on the number of points covered with wood, then plotted the residuals to check for normality and the resulting graph is slightly skewed to the right. Is this normally distributed?

model <- glmer(Number~Year*Treatment(1|Year/Treatment), data=data,family=poisson)


enter image description here

This site recommends using
glmmPQL
if your data is not normal: http://ase.tufts.edu/gsc/gradresources/guidetomixedmodelsinr/mixed%20model%20guide.html

library(MASS)
library(nlme)

model1<-glmmPQL(Number~Year*Treatment,~1|Year/Treatment,
family=gaussian(link = "log"),
data=data,start=coef(lm(Log~Year*Treatment)),
na.action = na.pass,verbose=FALSE)
summary(model1)
plot(model1)


Now do you transform the data in the Excel document or in the R code (
Number1 <- log(Number)
) before running this model? Does the
link="log"
imply that the data is already log transformed or does it imply that it will transform it?

If you have data with zeros, is it acceptable to add 1 to all observations to make it more than zero in order to log transform it:
Number1<-log(Number+1)
?

Is
fit<-anova(model,model1,test="Chisq")
sufficient to compare both models?

Many thanks for any advice!

Answer

tl;dr your diagnostic plots look OK to me, you can probably proceed to interpret your results.

  • This formula:

    Number~Year*Treatment+(1|Year/Treatment)
    

might not be quite right (besides the missing + between the terms above ...) In general you shouldn't include the same term in both the random and the fixed effects (although there is one exception - if Year has more than a few values and there are multiple observations per year you can include it as a continuous covariate in the fixed effects and a grouping factor in the random effects - so this might be correct).

  • I'm not crazy about the linked introduction; at a quick skim there's nothing horribly wrong with it, but there seem to b e a lot of minor inaccuracies and confusions. "Use glmmPQL if your data aren't Normal" is really shorthand for "you might want to use a GLMM if your data aren't Normal". Your glmer model should be fine.
  • interpreting diagnostic plots is a bit of an art, but the degree of deviation that you show above doesn't look like a problem.
  • since you don't need to log-transform your data, you don't need to get into the slightly messy issue of how to log-transform data containing zeros. In general log(1+x) transformations for count data are reasonable - but, again, unnecessary here.
  • anova() in this context does a likelihood ratio test, which is a reasonable way to compare models.
Comments