Regression Shrinkage and Selection via the Lasso
These are my notes from the paper Regression Shrinkage and Selection via the Lasso by Robert Tibshirani.
Overview
The paper introduces a method for estimating coefficients of a Linear Regression model called Lasso or Least Absolute Shrinkage and Selection Operator.
It builds on Ordinary Least Squares by ensuring the sum of the absolute values of the coefficients is below some value .
Unlike alternative methods Variable Subset Selection it tends to generalise better, and Ridge Regression tends to make more interpretable models, since it typically makes some coefficients exactly 0 and prioritises key features, making models more interpretable, i.e. we can understand which coefficients impact the outcome the most.
Lasso Algorithm
Given data for , where are the predictor variables, and are the labels.
The observations are assumed to be independent, or the are conditionally independent given the . Assumes predictors are standardised so that and .
The LASSO estimate is defined as:
is a tuning parameter.
For all , the solution for is , We can assume without loss of generality that and hence omit the bias term , focusing on the coefficients
Computation of the solution is a quadratic programming problem with linear inequality constraints.
The parameter controls the amount of shrinkage applied to the estimates.
Let be the full least squares estimates and let . Values of will cause shrinkage of the solutions towards 0, and some coefficients may be exactly equal to 0. For example, if , the effect will be roughly similar to finding the best subset of size . Note also that the design matrix does not need to be of full rank.
The motivation for the Lasso came from Breiman's Non-negative Garrotte, which minimises:
A drawback of the garotte is that depends on both the sign and the magnitude of the Ordinary Least Squares estimates.
In overfit or highly correlated settings where the OLS estimates behave poorly, the garotte may suffer as a result. In contrast, the Lasso avoids the explicit use of the OLS estimates.