Sacha Epskamp Sacha Epskamp - 1 month ago 14
R Question

How does glmnet compute the maximal lambda value?

The

glmnet
package uses a range of
LASSO
tuning parameters
lambda
scaled from the maximal
lambda_max
under which no predictors are selected. I want to find out how
glmnet
computes this
lambda_max
value. For example, in a trivial dataset:

set.seed(1)
library("glmnet")
x <- matrix(rnorm(100*20),100,20)
y <- rnorm(100)
fitGLM <- glmnet(x,y)
max(fitGLM$lambda)
# 0.1975946


The package vignette (http://www.jstatsoft.org/v33/i01/paper) describes in section 2.5 that it computes this value as follows:

sx <- as.matrix(scale(x))
sy <- as.vector(scale(y))
max(abs(colSums(sx*sy)))/100
# 0.1865232


Which clearly is close but not the same value. So, what causes this difference? And in a related question, how could I compute
lambda_max
for a logistic regression?

Answer

To get the same result you need to standardize the variables using a standard deviation with n instead of n-1 denominator.

mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y))
sx <- scale(x,scale=apply(x,2,mysd))
sx <- as.matrix(sx,ncol=20,nrow=100)
sy <- as.vector(scale(y,scale=mysd))
max(abs(colSums(sx*sy)))/100
## [1] 0.1758808
fitGLM <- glmnet(sx,sy)
max(fitGLM$lambda)
## [1] 0.1758808