strv7 - 2 months ago 14

R Question

I have an

`lm`

`CREATE OR REPLACE FUNCTION lm_predict(`

feat_vec float[],

model bytea

)

RETURNS float

AS

$$

#R-code goes here.

mdl <- unserialize(model)

# class(feat_vec) outputs "array"

y_hat <- predict.lm(mdl, newdata = as.data.frame.list(feat_vec))

return (y_hat)

$$ LANGUAGE 'plr';

This returns the wrong

`y_hat`

`feat_vec`

`CREATE OR REPLACE FUNCTION lm_predict(`

feat_vec float[],

model bytea

)

RETURNS float

AS

$$

#R-code goes here.

mdl <- unserialize(model)

coef = mdl$coefficients

y_hat = coef[1] + as.numeric(coef[-1]%*%feat_vec)

return (y_hat)

$$ LANGUAGE 'plr';

What am I doing wrong?? It is the same unserialized model, the first option should give me the right answer as well...

Answer

The problem seems to be the use of `newdata = as.data.frame.list(feat_vec)`

. As discussed in your previous question, this returns ugly column names. While when you call `predict`

, `newdata`

must have column names consistent with covariates names in your model formula. You should get some warning message when you call `predict`

.

```
## example data
set.seed(0)
x1 <- runif(20)
x2 <- rnorm(20)
y <- 0.3 * x1 + 0.7 * x2 + rnorm(20, sd = 0.1)
## linear model
model <- lm(y ~ x1 + x2)
## new data
feat_vec <- c(0.4, 0.6)
newdat <- as.data.frame.list(feat_vec)
# X0.4 X0.6
#1 0.4 0.6
## prediction
y_hat <- predict.lm(model, newdata = newdat)
#Warning message:
#'newdata' had 1 row but variables found have 20 rows
```

What you need is

```
newdat <- as.data.frame.list(feat_vec,
col.names = attr(model$terms, "term.labels"))
# x1 x2
#1 0.4 0.6
y_hat <- predict.lm(model, newdata = newdat)
# 1
#0.5192413
```

This is the same as what you can compute manually:

```
coef = model$coefficients
unname(coef[1] + sum(coef[-1] * feat_vec))
#[1] 0.5192413
```