Denis Gorodetskiy - 1 year ago 73
R Question

# How to implement R model in C++ code

I have a model in R:

``````> s1 <- toys[1:10000,]
> model <- glm(V11~V2+V3+V5+V7+V8+V9+V10,gaussian,s1)
> model

Call:  glm(formula = V11 ~ V2 + V3 + V5 + V7 + V8 + V9 + V10, family = gaussian,
data = s1)

Coefficients:
(Intercept)           V2           V3           V5           V7           V8           V9          V10
-0.900106     0.006385    -0.005080     1.006324     0.229282     0.012391    -0.049307    -0.186450

Degrees of Freedom: 9999 Total (i.e. Null);  9992 Residual
Null Deviance:      11050000
Residual Deviance: 121200   AIC: 53340
``````

Now, how do I program this R model as a C function? (RTFM with a link would be enough)

Probably I just have to multiply all the coefficients from R model by their respective inputs and add all the terms to get the final result?

``````float model(float v2, float v3, ... float v10)
{
return -0.900106 * v2 + 0.006385 * v3 + .. + (-0.186450) * v10;
}
``````

I need stand-alone code not depending on any external sources

You're asking for a linear regression model (here, R `glm()` stands for generalized linear model, but as you're using an identity link, you end up with a linear regression). There are several implementation available in C, for example the apophenia library which features a nice set of statistical functions with bindings for MySQL and Python. The GSL and ALGLIB libraries also have dedicated algorithms.

However, for lightweight and almost standalone C code, I would suggest taking a look at `glm_test.c` available in the source of the snpMatrix BioC package.

Following the updated question, it seems you rather want to predict the outcome based on a set of regression parameters. Then, given that the general form of the hypothesized model is y=b0 + b1 * x1 + b2 * x2 + ... + bp * xp, where b0 is the intercept and b1, ..., bp are the regression coefficients (estimated from the data), the computation is rather simple as it amounts to a weighted sum: take each observed values for your p predictors and multiply by the b's (don't forget the intercept term!).

You can double check your results with R `predict()` function; here is an example with two predictors, named `V1` and `V2`, 100 observations, and a regular grid of new values for predicting the outcome (you can use your own data as well):

``````> df <- transform(X <- as.data.frame(replicate(2, rnorm(100))),
y = V1+V2+rnorm(100))
> res.lm <- lm(y ~ ., df)
> new.data <- data.frame(V1=seq(-3, 3, by=.5), V2=seq(-3, 3, by=.5))
> coef(res.lm)
(Intercept)          V1          V2
0.006712008 0.980712578 1.127586352
> new.data
V1   V2
1  -3.0 -3.0
2  -2.5 -2.5
...
> 0.0067 + 0.9807*-3 + 1.1276*-3  # with approximation
[1] -6.3182
> predict(res.lm, new.data)[1]
1
-6.318185
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download