Luke - 9 months ago 50

R Question

I'm really scratching my head here. I really don't understand what is going on. This is a MWE, but the actual code and purpose is more complex then this. So the code:

`library(dplyr)`

ds <- mutate(iris, Species = as.numeric(Species == 'setosa'))

ds %>%

do_(

.dots = lazyeval::interp(

"broom::tidy(stats::glm(form, data = ., family = distr))",

form = Species ~ Sepal.Length,

distr = binomial()

)

)

Which returns:

`Error in family$linkfun(mustart) : object 'C_logit_link' not found`

`ds %>%`

do_(

.dots = lazyeval::interp(

"broom::tidy(stats::glm(form, data = ., family = distr))",

form = Sepal.Width ~ Sepal.Length,

distr = gaussian()

)

)

The only difference between the two is the family distribution used (gaussian vs binomial) and the variable used.

So the question: why is it that lazyeval can't find

`C_logit_link`

Answer Source

When you call `interp(x, *)`

, it evaluates the arguments that are to be interpolated into `x`

. In the case of `binomial()`

, the result is a structure that represents the binomial distribution in a GLM.

```
interp(~x, x=binomial())
#~list(family = "binomial", link = "logit", linkfun = function (mu)
#.Call(C_logit_link, mu), linkinv = function (eta)
#.Call(C_logit_linkinv, eta), variance = function (mu)
#mu * (1 - mu), dev.resids = function (y, mu, wt)
#.Call(C_binomial_dev_resids, y, mu, wt), aic = function (y, n,
# mu, wt, dev)
#{
# m <- if (any(n > 1))
# . . .
```

Buried inside that structure is a function that calls out to compiled C code, via the object `C_logit_link`

. This is an unexported object in the stats package. Normally everything works fine, because the environment of that function is the stats namespace and so it's able to find `C_logit_link`

.

The problem here is that the object you're interpolating is a string, which means that everything interpolated into it is also coerced into a string. That loses the environment information necessary to find `C_logit_link`

.

The solution is to interp a formula instead:

```
library(dplyr)
ds <- mutate(iris, Species = as.numeric(Species == 'setosa'))
ds %>%
do_(
.dots = lazyeval::interp(
~broom::tidy(stats::glm(form, data = ., family = distr)), # formula
form = Species ~ Sepal.Length,
distr = binomial()
)
)
# term estimate std.error statistic p.value
#1 (Intercept) 27.828521 4.8275611 5.764509 8.189574e-09
#2 Sepal.Length -5.175698 0.8933984 -5.793270 6.902910e-09
```