Denis Gorodetskiy Denis Gorodetskiy - 8 months ago 39
R Question

How to implement R model in C++ code

I have a model in R:

> s1 <- toys[1:10000,]
> model <- glm(V11~V2+V3+V5+V7+V8+V9+V10,gaussian,s1)
> model

Call: glm(formula = V11 ~ V2 + V3 + V5 + V7 + V8 + V9 + V10, family = gaussian,
data = s1)

(Intercept) V2 V3 V5 V7 V8 V9 V10
-0.900106 0.006385 -0.005080 1.006324 0.229282 0.012391 -0.049307 -0.186450

Degrees of Freedom: 9999 Total (i.e. Null); 9992 Residual
Null Deviance: 11050000
Residual Deviance: 121200 AIC: 53340

Now, how do I program this R model as a C function? (RTFM with a link would be enough)

Probably I just have to multiply all the coefficients from R model by their respective inputs and add all the terms to get the final result?

float model(float v2, float v3, ... float v10)
return -0.900106 * v2 + 0.006385 * v3 + .. + (-0.186450) * v10;

I need stand-alone code not depending on any external sources

chl chl

You're asking for a linear regression model (here, R glm() stands for generalized linear model, but as you're using an identity link, you end up with a linear regression). There are several implementation available in C, for example the apophenia library which features a nice set of statistical functions with bindings for MySQL and Python. The GSL and ALGLIB libraries also have dedicated algorithms.

However, for lightweight and almost standalone C code, I would suggest taking a look at glm_test.c available in the source of the snpMatrix BioC package.

Following the updated question, it seems you rather want to predict the outcome based on a set of regression parameters. Then, given that the general form of the hypothesized model is y=b0 + b1 * x1 + b2 * x2 + ... + bp * xp, where b0 is the intercept and b1, ..., bp are the regression coefficients (estimated from the data), the computation is rather simple as it amounts to a weighted sum: take each observed values for your p predictors and multiply by the b's (don't forget the intercept term!).

You can double check your results with R predict() function; here is an example with two predictors, named V1 and V2, 100 observations, and a regular grid of new values for predicting the outcome (you can use your own data as well):

> df <- transform(X <-, rnorm(100))), 
                                     y = V1+V2+rnorm(100))
> res.lm <- lm(y ~ ., df)
> <- data.frame(V1=seq(-3, 3, by=.5), V2=seq(-3, 3, by=.5))
> coef(res.lm)
(Intercept)          V1          V2 
0.006712008 0.980712578 1.127586352 
     V1   V2
1  -3.0 -3.0
2  -2.5 -2.5
> 0.0067 + 0.9807*-3 + 1.1276*-3  # with approximation
[1] -6.3182
> predict(res.lm,[1]