dan - 1 year ago 204

R Question

I have binomial count data, coming from a set of conditions, that are overdisperesed. To simulate them I use the beta binomial distribution implemented by the

`rbetabinom`

`emdbook`

`R`

`library(emdbook)`

set.seed(1)

df <- data.frame(p = rep(runif(3,0,1)),

n = as.integer(runif(30,100,200)),

theta = rep(runif(3,1,5)),

cond = rep(LETTERS[1:3],10),

stringsAsFactors=F)

df$k <- sapply(1:nrow(df), function(x) rbetabinom(n=1, prob=df$p[x], size=df$n[x],theta = df$theta[x], shape1=1, shape2=1))

I want to find the effect of each condition (

`cond`

`k`

I think the

`glm.nb`

`MASS`

`R`

`library(MASS)`

fit <- glm.nb(k ~ cond + offset(log(n)), data = df)

My question is how to set the contrasts such that I get the effect of each condition relative to the mean effects over all conditions rather than relative to the

`dummy`

`A`

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Two things: (1) if you want contrasts relative to the mean, use `contr.sum`

rather than the default `contr.treatment`

; (2) you probably shouldn't fit beta-binomial data with a negative binomial model; use a beta-binomial model instead (e.g. via `VGAM`

or `bbmle`

)!

```
library(emdbook)
set.seed(1)
df <- data.frame(p = rep(runif(3,0,1)),
n = as.integer(runif(30,100,200)),
theta = rep(runif(3,1,5)),
cond = rep(LETTERS[1:3],10),
stringsAsFactors=FALSE)
## slightly abbreviated
df$k <- rbetabinom(n=nrow(df), prob=df$p,
size=df$n,theta = df$theta, shape1=1, shape2=1)
```

With `VGAM`

:

```
library(VGAM)
## note dbetabinom/rbetabinom from emdbook are masked
options(contrasts=c("contr.sum","contr.poly"))
vglm(cbind(k,n-k)~cond,data=df,
family=betabinomialff(zero=2)
## hold shape parameter 2 constant
)
## Coefficients:
## (Intercept):1 (Intercept):2 cond1 cond2
## 0.4312181 0.5197579 -0.3121925 0.3011559
## Log-likelihood: -147.7304
```

Here *intercept* is the mean shape parameter across the levels; `cond1`

and `cond2`

are the differences of levels 1 and 2 from the mean (this doesn't give you the difference of level 3 from the mean, but by construction it should be (`-cond1-cond2`

) ...)

I find the parameterization with `bbmle`

(with logit-probability and dispersion parameter) a little easier:

```
detach("package:VGAM")
library(bbmle)
mle2(k~dbetabinom(k, prob=plogis(lprob),
size=n, theta=exp(ltheta)),
parameters=list(lprob~cond),
data=df,
start=list(lprob=0,ltheta=0))
## Coefficients:
## lprob.(Intercept) lprob.cond1 lprob.cond2 ltheta
## -0.09606536 -0.31615236 0.17353311 1.15201809
##
## Log-likelihood: -148.09
```

The log-likelihoods are about the same (the VGAM parameterization is a bit better); in theory, if we allowed both shape1 and shape2 (VGAM) or lprob and ltheta (`bbmle`

) to vary across conditions, we'd get the same log-likelihoods for both parameterizations.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**