Denis Gorodetskiy - 11 months ago 50

R Question

I have a model in R:

`> s1 <- toys[1:10000,]`

> model <- glm(V11~V2+V3+V5+V7+V8+V9+V10,gaussian,s1)

> model

Call: glm(formula = V11 ~ V2 + V3 + V5 + V7 + V8 + V9 + V10, family = gaussian,

data = s1)

Coefficients:

(Intercept) V2 V3 V5 V7 V8 V9 V10

-0.900106 0.006385 -0.005080 1.006324 0.229282 0.012391 -0.049307 -0.186450

Degrees of Freedom: 9999 Total (i.e. Null); 9992 Residual

Null Deviance: 11050000

Residual Deviance: 121200 AIC: 53340

Now, how do I program this R model as a C function? (RTFM with a link would be enough)

Probably I just have to multiply all the coefficients from R model by their respective inputs and add all the terms to get the final result?

`float model(float v2, float v3, ... float v10)`

{

return -0.900106 * v2 + 0.006385 * v3 + .. + (-0.186450) * v10;

}

I need stand-alone code not depending on any external sources

Answer Source

You're asking for a linear regression model (here, R `glm()`

stands for generalized linear model, but as you're using an identity link, you end up with a linear regression). There are several implementation available in C, for example the apophenia library which features a nice set of statistical functions with bindings for MySQL and Python. The GSL and ALGLIB libraries also have dedicated algorithms.

However, for lightweight and almost standalone C code, I would suggest taking a look at `glm_test.c`

available in the source of the snpMatrix BioC package.

Following the updated question, it seems you rather want to predict the outcome based on a set of regression parameters. Then, given that the general form of the hypothesized model is y=b0 + b1 * x1 + b2 * x2 + ... + bp * xp, where b0 is the intercept and b1, ..., bp are the regression coefficients (estimated from the data), the computation is rather simple as it amounts to a weighted sum: take each observed values for your p predictors and multiply by the b's (don't forget the intercept term!).

You can double check your results with R `predict()`

function; here is an example with two predictors, named `V1`

and `V2`

, 100 observations, and a regular grid of new values for predicting the outcome (you can use your own data as well):

```
> df <- transform(X <- as.data.frame(replicate(2, rnorm(100))),
y = V1+V2+rnorm(100))
> res.lm <- lm(y ~ ., df)
> new.data <- data.frame(V1=seq(-3, 3, by=.5), V2=seq(-3, 3, by=.5))
> coef(res.lm)
(Intercept) V1 V2
0.006712008 0.980712578 1.127586352
> new.data
V1 V2
1 -3.0 -3.0
2 -2.5 -2.5
...
> 0.0067 + 0.9807*-3 + 1.1276*-3 # with approximation
[1] -6.3182
> predict(res.lm, new.data)[1]
1
-6.318185
```