Manassa Mauler - 1 month ago 9

R Question

I want to work out a multiple regression example all the way through using matrix algebra to calculate the regression coefficients.

`#create vectors -- these will be our columns`

y <- c(3,3,2,4,4,5,2,3,5,3)

x1 <- c(2,2,4,3,4,4,5,3,3,5)

x2 <- c(3,3,4,4,3,3,4,2,4,4)

#create matrix from vectors

M <- cbind(y,x1,x2)

k <- ncol(M) #number of variables

n <- nrow(M) #number of subjects

#create means for each column

M_mean <- matrix(data=1, nrow=n) %*% cbind(mean(y),mean(x1),mean(x2)); M_mean

#creates a difference matrix which gives deviation scores

D <- M - M_mean; D

#creates the covariance matrix, the sum of squares are in the diagonal and the sum of cross products are in the off diagonals.

C <- t(D) %*% D; C

I can see what the final values should be (-.19, -.01) and what the matrices before this calculation look like.

`E<-matrix(c(10.5,3,3,4.4),nrow=2,ncol=2)`

F<-matrix(c(-2,-.6),nrow=2,ncol=1)

But I'm not sure how to create these from the variance-covariance matrix to get the coefficients using matrix algebra.

Hope you can help.

Answer

I can see that you are doing centred regression:

The answer by sandipan is not quite what you want, as it goes through the usual normal equation to estimate:

There is already a thread on the latter: Solving normal equation gives different coefficients from using `lm`

? Here I focus on the former.

Actually you are already quite close. You have obtained the mixed covariance `C`

:

```
# y x1 x2
#y 10.4 -2.0 -0.6
#x1 -2.0 10.5 3.0
#x2 -0.6 3.0 4.4
```

From your definition of `E`

and `F`

, you know you need sub-matrices to proceed. In fact, you can do matrix subsetting rather than manually imputing:

```
E <- C[2:3, 2:3]
# x1 x2
#x1 10.5 3.0
#x2 3.0 4.4
F <- C[2:3, 1, drop = FALSE] ## note the `drop = FALSE`
# y
#x1 -2.0
#x2 -0.6
```

Then the estimate is just , and you can do in R (read `?solve`

):

```
c(solve(E, F)) ## use `c` to collapse matrix into a vector
# [1] -0.188172043 -0.008064516
```

**Other suggestions**

- you can find column means by
`colMeans`

, instead of a matrix multiplication (read`?colMeans`

); - you can perform centring by using
`sweep`

(read`?sweep`

); - use
`crossprod(D)`

than`t(D) %*% D`

(read`?crossprod`

).

Here is a session I would do:

```
y <- c(3,3,2,4,4,5,2,3,5,3)
x1 <- c(2,2,4,3,4,4,5,3,3,5)
x2 <- c(3,3,4,4,3,3,4,2,4,4)
M <- cbind(y,x1,x2)
M_mean <- colMeans(M)
D <- sweep(M, 2, M_mean)
C <- crossprod(D)
E <- C[2:3, 2:3]
F <- C[2:3, 1, drop = FALSE]
c(solve(E, F))
# [1] -0.188172043 -0.008064516
```