Raphael - 2 months ago 11x
R Question

# How to estimate multilevel Poisson models in lme4 with large count values?

The case:
I am trying to estimate gravity models of migration (

`mig`
) from 55 origin districts (
`IDo`
) to 54 destination districts (
`IDd`
). Two important predictors are the distance (
`dist`
) between each origin and destination district and an indicator variable of the contiguity (
`contig`
) of any origin-destination pair. The migration outcome variable is a measure of the count of people migrating between pairs of districts (ranging from zero to a couple of thousands). Below is a test data set that approximates my data situation. In the real data, the migration outcome variable is slightly less ideally poisson distributed (
`min=0, max=9450, mean=85, median=10`
).

``````library(lme4)

#*** Generate test data set
set.seed=777
td=data.frame(IDo=rep(1:55,each=55),IDd=rep(1:55,times=55),dist=runif(3025,0.186,12.7),contig=rbinom(3025,1,p=0.08), stringsAsFactors=F)
td=td[td\$IDo!=td\$IDd,]              # remove cases for which origin and destination are the same
td=rbind(td,td)                     # dublicate data for two years
td\$year=rep(c(1,0),each=2970)       # generate year dummy variable
td\$mig=rpois(5940, lambda = 1)*1000 # generate migrant count variable

# Cross-classified random effects Poisson models
m1=formula(mig~dist+contig+year+(1|IDo)+(1|IDd))
fm1=glmer(m1,data=td,family="poisson",control=glmerControl(optimizer="bobyqa"), nAGQ=1) # Laplace Approximation

# Regular Poisson models
m2=formula(mig~dist+contig+year+factor(IDo)+factor(IDd))
fm2=glm(m2,data=td,family="poisson")
``````

The problem:
I am running cross-classified multilevel Poisson models using
`lme4`
. When using the default nAGQ=1 setting (
`fm1`
), I always get the following warning message, and in the real data case, the model fails to converge (
`In checkConv(attr(opt, "derivs"), opt\$par, control\$checkConv Model failed to converge with max|grad| = 0.00248588 (tol = 0.001, component 1)`
).

``````Warning message:
In checkConv(attr(opt, "derivs"), opt\$par, ctrl = control\$checkConv,  :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
``````

However, this problem does not emerge with setting of nAGQ=0 (
`fm0`
) or when running it as a regular fixed effects model (
`fm2`
).

Does anyone have a suggestion what to do to avoid the problem that underlies this warning message (and leads to convergence issues in the real data)? The predictor variables have all a reasonably small scale, so no need to rescale these variables. I could potentially rescale my outcome variable to have smaller values (
`td\$mig=round(td\$mig/1000,0)`
) but this would change the distribution and should not be done as indicated in this post.

I really think this is a false positive; it's unusual to have count data with such large counts ... have you tried fitting with different optimizers (see `?lme4::convergence`) ? (I just tried your example with `control=glmerControl(optim="nloptwrap")` and got near-identical results.)

I'm a little worried about your distributional assumptions. I know it's just a reproducible example, but drawing a Poisson sample and multiplying it by 1000 does not give you a Poisson-distributed value ... here's a little posterior predictive simulation that simulates the distribution of the 90th quantile of the response under the fitted model -- which is way different from the observed value ...

``````ss <- simulate(fm1,1000,seed=101)
qq <- sapply(ss,quantile,0.9)
hist(qq,breaks=50,col="gray")
summary(qq)
##   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   1106    1153    1167    1168    1182    1255
quantile(td\$mig,0.9) ## 2000
``````