How do I do regression when my predictors have multicollinearity?

A quick summary of the key idea of principal components regression (PCR), its advantages and extensions.

Sometimes we find ourselves in a dire situation. We have measured some response y and a set of predictors W. Unfortunately, W is a wide but short matrix, say 10×100 or worse 10×100000. We’ve made only 10 observations. Standard regression is simply not going to work, because W is singular. Some would say p is bigger than n.

So what can we do? Many of us would jump to LASSO or ridge regression. However, there is another way that is often overlooked.

Principal components analysis (PCA) is a popular method for dimensionality reduction. In brief, we project our data onto an orthogonal basis ordered with descending variance. PCA is often used to discard components of low variance or easily visualise directions in the data with the most variance. PCA is a quick and easy way to perform: dimensionality reduction, avoid multicollinearity (transformed variables are orthogonal), regularisation (dropping low variance components is a form of regularisation). 

I thought this was a blog post about regression! We often find ourselves with a set of predictors that are highly correlated or even collinear. Even worse, the predictor matrix can be singular. A “trick” is to perform PCA on the predictor matrix W and only use a few principal components for the regression problem. Let’s say, we use 2. Then, thinking about the original problem we have a new matrix P, which is 10×2. Regression is now easy as usual – simply use P instead of W. The two components will be orthogonal so no fears of multicollinearity and the regression problem is now well specified. Phew! This idea is called principal components regression (PCR).

But Olly, how do I choose the number of components!? In dimensionality reduction, choosing the number of principal components can feel arbitrary. For example, take as few components as possible such that we explain 99% of the variance. However, in regression, we have a response y. Hence, we could use cross-validation to find the number of components that minimises the mean-squared error (MSE). If there is multicollinearity, including more variables won’t necessarily help prediction and so we end up with a parsimonious set of variables. 

Technical side-step. For those interested: any given linear form of the PCR estimator (estimate of the coefficient) has a lower variance compared to the ordinary linear square estimator using the same linear form. This is an exercise to the reader, with a hint of using the spectral decomposition implicit in PCA.

PCR can be used to perform guided dimensionality reduction. Suppose, we have some responses y that are “useful”. We can use the cross-validation approach introduced earlier to choose the number of components of W, rather than relying on variance explained.

PCR is hard regularisation. The PCR estimator constrains the resulting solution to the column space of the selected principal component directions. As a result, it is orthogonal to the excluded directions. You have been warned!

PCR is more difficult to interpret than standard linear regression. The coefficients obtained by linear regression are easy to interpret; but in PCR, we are using linear combinations of the original predictors. Unravelling this can be a bit tricky!

Just like PCA can be extended to kernel PCA, PCR can be extended to the kernel setting. Hence, the regression function need not be linear in the covariates. In another direction (sorry!), our response y need not be continuous. We can use generalised linear models to connect our outcome to our principal components.

One of the key tools in functional linear models, the field that studies predictors and responses which are functions, is functional principal components analysis. The idea of translating PCR into the infinite setting is fundamental to functional data analysis. 

So if you have too many predictors, but some of them might be redundant or multicollinear – why not try PCR?

I suggest the pls R-package, if you want to have a play!

Author