PDG PDG - 22 days ago 15
R Question

How to interpret lm() coefficient estimates when using bs() function for splines

I'm using a set of points which go from

(-5,5)
to
(0,0)
and
(5,5)
in a "symmetric V-shape". I'm fitting a model with
lm()
and the
bs()
function to fit a "V-shape" spline:

lm(formula = y ~ bs(x, degree = 1, knots = c(0)))


I get the "V-shape" when I predict outcomes by
predict()
and draw the prediction line. But when I look at the model estimates
coef()
, I see estimates that I don't expect.

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.93821 0.16117 30.639 1.40e-09 ***
bs(x, degree = 1, knots = c(0))1 -5.12079 0.24026 -21.313 2.47e-08 ***
bs(x, degree = 1, knots = c(0))2 -0.05545 0.21701 -0.256 0.805


I would expect a
-1
coefficient for the first part and a
+1
coefficient for the second part. Must I interpret the estimates in a different way?

If I fill the knot in the
lm()
function manually than I get these coefficients:

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.18258 0.13558 -1.347 0.215
x -1.02416 0.04805 -21.313 2.47e-08 ***
z 2.03723 0.08575 23.759 1.05e-08 ***


That's more like it. Z's (point of knot) relative change to x is ~ +1

I want to understand how to interpret the
bs()
result. I've checked, the manual and
bs
model prediction values are exact the same.

Answer

I would expect a -1 coefficient for the first part and a +1 coefficient for the second part.

I think your question is really about what is a B-spline function. If you want to understand the meaning of coefficients, you need to know what basis functions are for your spline. See the following:

library(splines)
x <- seq(-5, 5, 100)
b <- bs(x, degree = 1, knots = 0)  ## returns a basis matrix
str(b)  ## check structure
b1 <- b[, 1]  ## basis 1
b2 <- b[, 2]  ## basis 2
par(mfrow = c(1, 2))
plot(x, b1, type = "l", main = "basis 1: b1")
plot(x, b2, type = "l", main = "basis 2: b2")

basis

Note:

  1. B-splines of degree-1 are tent functions, as you can see from b1;
  2. B-splines of degree-1 are scaled, so that their functional value is between (0, 1);
  3. a knots of a B-spline of degree-1 is where it bends;
  4. B-splines of degree-1 are compact, and are only non-zero over (no more than) three adjacent knots.

You can get the (recursive) expression of B-splines from Definition of B-spline. B-spline of degree 0 is the most basis class, while

  • B-spline of degree 1 is a linear combination of B-spline of degree 0
  • B-spline of degree 2 is a linear combination of B-spline of degree 1
  • B-spline of degree 3 is a linear combination of B-spline of degree 2

(Sorry, I was getting off-topic...)

Your linear regression using B-splines:

y ~ bs(x, degree = 1, knots = 0)

is just doing:

y ~ b1 + b2

Now, you should be able to understand what coefficient you get mean, it means that the spline function is:

-5.12079 * b1 - 0.05545 * b2

In summary table:

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)  
(Intercept)                       4.93821    0.16117  30.639 1.40e-09 ***
bs(x, degree = 1, knots = c(0))1 -5.12079    0.24026 -21.313 2.47e-08 ***
bs(x, degree = 1, knots = c(0))2 -0.05545    0.21701  -0.256    0.805 

You might wonder why the coefficient of b2 is not significant. Well, compare your y and b1: Your y is symmetric V-shape, while b1 is reverse symmetric V-shape. If you first multiply -1 to b1, and rescale it by multiplying 5, (this explains the coefficient -5 for b1), what do you get? Good match, right? So there is no need for b2.

However, if your y is asymmetric, running trough (-5,5) to (0,0), then to (5,10), then you will notice that coefficients for b1 and b2 are both significant. I think the other answer already gave you such example.

Comments