Overview of Feature Selection Techniques


If our goal is to select a model that may have the best predictions for new datasets, now that we have a metric that evaluates the parsimoniousness of a model, we could, in theory, try to evaluate every possible model that could be created out of a set of $p$ possible explanatory variables. Then, we could select the model with the highest adjusted R^2.

However, as we discussed earlier, not even including the use of interaction terms, one could construct $2^p$ possible models based on whether your include or exclude of each of these $p$ explanatory variables. If the number of $p$ explanatory variables that we're considering is very small, like the $p=5$ that we are currently considering, then evaluating $2^5=32$ candidate models is a realistic computational exercise.

However, what we see is that even with a modest amount of potential explanatory variables, say $p=30$, the amount of $2^{30}=1,073,741,824$ regression models that we would need to evaluate combinatorically explodes.

2**30

1073741824

However, the benefit of conducting this exhaustive search of all possible models is that we know definitively which model will, say, have the highest adjusted R^2.

Heuristic Techniques

Rather exploring every possible $2^p$ regression models, we can instead use what we call heuristic algorithms which attempt to find a model that optimizes some metric (like the adjusted R^2) in an effecient amount of time.

Unfortunately, with a heuristic algorithm we cannnot guarantee that the model that is returned is the absolute best with respect to the desired metric, like the adjusted R^2. However, a heuristic algorithm is generally thought to be a good heuristic algorithm if it tends to satisfy the following two properties.

  1. It tends to return solutions (ie. models) that are considered "close enough" to the best possible solution (ie. model).
  2. It tends to return solutions in a "fast enough" amount of time.

In the sections below we will discuss three popular heuristic feature selection methods that seek to find the most parsimonious regression model out of a set of $p$ possible explanatory variables.

  1. Backwards Elimation Algoirthms
  2. Forward Selection Algorithms
  3. Regularization Techniques
  4. Principal Component Regression