Raphael - 7 months ago 40

R Question

**The case:**

I am trying to estimate gravity models of migration (

`mig`

`IDo`

`IDd`

`dist`

`contig`

`min=0, max=9450, mean=85, median=10`

`library(lme4)`

#*** Generate test data set

set.seed=777

td=data.frame(IDo=rep(1:55,each=55),IDd=rep(1:55,times=55),dist=runif(3025,0.186,12.7),contig=rbinom(3025,1,p=0.08), stringsAsFactors=F)

td=td[td$IDo!=td$IDd,] # remove cases for which origin and destination are the same

td=rbind(td,td) # dublicate data for two years

td$year=rep(c(1,0),each=2970) # generate year dummy variable

td$mig=rpois(5940, lambda = 1)*1000 # generate migrant count variable

# Cross-classified random effects Poisson models

m1=formula(mig~dist+contig+year+(1|IDo)+(1|IDd))

fm0=glmer(m1,data=td,family="poisson",control=glmerControl(optimizer="bobyqa"), nAGQ=0) # Adaptive Gauss-Hermite Quadrature

fm1=glmer(m1,data=td,family="poisson",control=glmerControl(optimizer="bobyqa"), nAGQ=1) # Laplace Approximation

# Regular Poisson models

m2=formula(mig~dist+contig+year+factor(IDo)+factor(IDd))

fm2=glm(m2,data=td,family="poisson")

I am running cross-classified multilevel Poisson models using

`lme4`

`fm1`

`In checkConv(attr(opt, "derivs"), opt$par, control$checkConv Model failed to converge with max|grad| = 0.00248588 (tol = 0.001, component 1)`

`Warning message:`

In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :

Model is nearly unidentifiable: very large eigenvalue

- Rescale variables?

However, this problem does not emerge with setting of nAGQ=0 (

`fm0`

`fm2`

`td$mig=round(td$mig/1000,0)`

Answer

I really think this is a false positive; it's unusual to have count data with such large counts ... have you tried fitting with different optimizers (see `?lme4::convergence`

) ? (I just tried your example with `control=glmerControl(optim="nloptwrap")`

and got near-identical results.)

I'm a little worried about your distributional assumptions. I know it's just a reproducible example, but drawing a Poisson sample and multiplying it by 1000 does **not** give you a Poisson-distributed value ... here's a little posterior predictive simulation that simulates the distribution of the 90th quantile of the response under the fitted model -- which is *way* different from the observed value ...

```
ss <- simulate(fm1,1000,seed=101)
qq <- sapply(ss,quantile,0.9)
hist(qq,breaks=50,col="gray")
summary(qq)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1106 1153 1167 1168 1182 1255
quantile(td$mig,0.9) ## 2000
```