TonyGW TonyGW - 1 month ago 16
R Question

Matrix multiplication in R: requires numeric/complex matrix/vector arguments

I'm using the dataset

BreastCancer
in the
mlbench
package, and I am trying to do the following matrix multiplication as a part of logistic regression.

I got the features in the first 10 columns, and create a vector of parameters called theta:

X <- BreastCancer[,1:10]
theta <- data.frame(rep(1,10))


Then I did the following matrix multiplication:

constant <- as.matrix(X) %*% as.vector(theta[,1])


However, I got the following error:

Error in as.matrix(X) %*% as.vector(theta[, 1]) :
requires numeric/complex matrix/vector arguments


Do I need to cast the matrix to double using
as.numeric(X)
first? as I see the values in X are like String with double quotes

Edit:
@Zheyuan Li:

My question is different from the one you are referring to, as it does not have the same issue as I have:
numeric/complex matrix/vector arguments
. Please re-open the question. Thanks

Answer

No, I can' stand it... after quite a long-winded discussion and sort of argument under your question, I felt no better way than to reopen this and answer it.

## drop incomplete data with NA
dat <- na.omit(BreastCancer)

## data type convert for variables other than `ID` and `Class`
dat[2:10] <- lapply(dat[2:10], function (x) as.numeric(levels(x)))[x])

## get the matrix
X <- data.matrix(dat[2:10])

## some possible matrix-vector multiplications
beta <- runif(9)
yhat <- X %*% beta

## add prediction back to data frame
dat$prediction <- yhat

There are several things I don't understand though... Why don't you use predict if you have a regression model? You gave an explanation but I don't get it at all. Anyway, the above should be comprehensive. If you want a data frame, there it is; if you want to use matrix-vector multiplication on legitimate numeric columns, go ahead; if you want to put prediction back to data frame, it is also done.


This line also worked for me: as.matrix(sapply(dat, as.numeric))

Looks like you were lucky. The dataset happens to have factor levels as same as numeric values. In general, converting a factor to numeric should use the method I did. Compare

f <- gl(4, 2, labels = c(12.3, 0.5, 2.9, -11.1))
#[1] 12.3  12.3  0.5   0.5   2.9   2.9   -11.1 -11.1
#Levels: 12.3 0.5 2.9 -11.1

as.numeric(f)
#[1] 1 1 2 2 3 3 4 4

as.numeric(levels(f))[f]
#[1] 12.3  12.3  0.5   0.5   2.9   2.9   -11.1 -11.1

Please read about ?factor thoroughly.

Comments