user1436340 - 1 year ago 74

R Question

Apologies if the answer is obvious but I've spent quite some time trying to use a custom link function in mgcv.gam

In short,

- I want to use a modified probit link from package psyphy ( I want to use psyphy.probit_2asym, I call it )
`custom_link`

- I can create a {stats}family object with this link and use it in the 'family' argument of glm.

`m <- glm(y~x, family=binomial(link=custom_link), ... )`

- It does not work when used as an argument for {mgcv}gam

`m <- gam(y~s(x), family=binomial(link=custom_link), ... )`

I get the error`Error in fix.family.link.family(family) : link not recognised`

I do not get the reason for this error, both glm and gam work if I specify the standard

`link=probit`

So my question can be summarized as:

Thanks in advance if you can give me a hint on what I should do.

`probit.2asym <- function(g, lam) {`

if ((g < 0 ) || (g > 1))

stop("g must in (0, 1)")

if ((lam < 0) || (lam > 1))

stop("lam outside (0, 1)")

linkfun <- function(mu) {

mu <- pmin(mu, 1 - (lam + .Machine$double.eps))

mu <- pmax(mu, g + .Machine$double.eps)

qnorm((mu - g)/(1 - g - lam))

}

linkinv <- function(eta) {

g + (1 - g - lam) *

pnorm(eta)

}

mu.eta <- function(eta) {

(1 - g - lam) * dnorm(eta) }

valideta <- function(eta) TRUE

link <- paste("probit.2asym(", g, ", ", lam, ")", sep = "")

structure(list(linkfun = linkfun, linkinv = linkinv,

mu.eta = mu.eta, valideta = valideta, name = link),

class = "link-glm")

}

Answer Source

As you may know, `glm`

takes **iteratively reweighted least squares** fitting iterations. Early version of `gam`

extends this by fitting an **iteratively penalized reweighted least squares**, which is done by the `gam.fit`

function. This is known as **performance iteration** in some context.

Since 2008 (or maybe slightly even earlier), `gam.fit3`

based on what is called **outer iteration** has replaced `gam.fit`

as `gam`

default. Such change does require some extra information of the family, regarding which you can read about `?fix.family.link`

.

The major difference between two iterations is whether iteration of coefficients `beta`

and iteration of smoothing parameters `lambda`

are nested.

- Performance iteration takes the nested way, where for each update of
`beta`

, a single iteration of`lambda`

is performed; - Outer iteration completely separate those 2 iterations, where for each update of
`beta`

, iteration of`lambda`

is carried to the end till convergence.

Obviously outer iteration is more stable and less likely to suffer from failure of convergence.

`gam`

has an argument `optimizer`

. By default it takes `optimizer = c("outer", "newton")`

, that is the newton method of outer iteration; but if you set `optimizer = "perf"`

, it will take performance iteration.

So, after the above overview, we have two options:

- still use outer iteration, but expand your customized link function;
- use performance iteration to stay in line with
`glm`

.

I am being lazy so will demonstrate the second one *(actually I am not feeling too confident to take the first approach)*.

**Reproducible Example**

You did not provide a reproducible example, so I prepare one as below.

```
set.seed(0)
x <- sort(runif(500, 0, 1)) ## covariates (sorted to make plotting easier)
eta <- -4 + 3 * x * exp(x) - 2 * log(x) * sqrt(x) ## true linear predictor
p <- binomial(link = "logit")$linkinv(eta) ## true probability (response)
y <- rbinom(500, 1, p) ## binary observations
table(y) ## a quick check that data are not skewed
# 0 1
#271 229
```

I will take `g = 0.1`

and `lam = 0.1`

of the function `probit.2asym`

you intend to use:

```
probit2 <- probit.2asym(0.1, 0.1)
par(mfrow = c(1,3))
## fit a glm with logit link
glm_logit <- glm(y ~ x, family = binomial(link = "logit"))
plot(x, eta, type = "l", main = "glm with logit link")
lines(x, glm_logit$linear.predictors, col = 2)
## glm with probit.2asym
glm_probit2 <- glm(y ~ x, family = binomial(link = probit2))
plot(x, eta, type = "l", main = "glm with probit2")
lines(x, glm_probit2$linear.predictors, col = 2)
## gam with probit.2aysm
library(mgcv)
gam_probit2 <- gam(y ~ s(x, bs = 'cr', k = 3), family = binomial(link = probit2),
optimizer = "perf")
plot(x, eta, type = "l", main = "gam with probit2")
lines(x, gam_probit2$linear.predictors, col = 2)
```

I have used natural cubic spline basis `cr`

for `s(x)`

, as for univariate smooth the default setting with thin-plate spline is unnecessary. I have also set a small basis dimension `k = 3`

(can't be smaller for a cubic spline) as my toy data is near linear and big basis dimension is not needed. More importantly, this seems to prevent convergence failure of performance iteration for my toy dataset.