user3266890 - 27 days ago 7

R Question

I am trying to fit my data to a beta-binomial distribution and estimate the alpha and beta shape parameters. For this distribution, the prior is taken from a beta distribution. Python does not have a fit function for beta-binomial but it does for beta. The python beta fitting and R beta binomial fitting is close but systematically off.

R:

`library("VGAM")`

x = c(222,909,918,814,970,346,746,419,610,737,201,865,573,188,450,229,629,708,250,508)

y = c(2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17)

fit=vglm(cbind(y, x) ~ 1, betabinomialff, trace = TRUE)

Coef(fit)

shape1 shape2

1.736093 26.870768

python:

`import scipy.stats`

import numpy as np

x = np.array([222,909,918,814,970,346,746,419,610,737,201,865,573,188,450,229,629,708,250,508], dtype=float)

y = np.array([2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17])

scipy.stats.beta.fit((y)/(x+y), floc=0, fscale=1)

(1.5806623978910086, 24.031893492546242, 0, 1)

I have done this many times and it seems like python is systematically a little bit lower than the R results. I was wondering if this is an input error on my part or just a difference in the way they are calculated?

Answer Source

Your problem is that fitting a beta-binomial model is just not the same as fitting a Beta model with the values equal to the ratios. I'm going to illustrate here with the `bbmle`

package, which will fit similar models to `VGAM`

(but with which I'm more familiar).

Preliminaries:

```
library("VGAM") ## for dbetabinom.ab
x <- c(222,909,918,814,970,346,746,419,610,737,
201,865,573,188,450,229,629,708,250,508)
y <- c(2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17)
library("bbmle")
```

Fit beta-binomial model:

```
mle2(y~dbetabinom.ab(size=x+y,shape1,shape2),
data=data.frame(x,y),
start=list(shape1=2,shape2=30))
## Coefficients:
## shape1 shape2
## 1.736046 26.871526
```

This agrees more or less perfectly with the `VGAM`

result you quote.

Now use the same framework to fit a Beta model instead:

```
mle2(y/(x+y) ~ dbeta(shape1,shape2),
data=data.frame(x,y),
start=list(shape1=2,shape2=30))
## Coefficients:
## shape1 shape2
## 1.582021 24.060570
```

This fits your Python, beta-fit result. (I'm sure if you used `VGAM`

to fit the Beta you'd get the same answer too.)