Overview of Feature Selection Techniques
Exhaustive Search
If our goal is to select a model that may have the best predictions for new datasets, now that we have a metric that evaluates the parsimoniousness of a model, we could, in theory, try to evaluate every possible model that could be created out of a set of $p$ possible explanatory variables. Then, we could select the model with the highest adjusted R^2.
However, as we discussed earlier, not even including the use of interaction terms, one could construct $2^p$ possible models based on whether your include or exclude of each of these $p$ explanatory variables. If the number of $p$ explanatory variables that we're considering is very small, like the $p=5$ that we are currently considering, then evaluating $2^5=32$ candidate models is a realistic computational exercise.
However, what we see is that even with a modest amount of potential explanatory variables, say $p=30$, the amount of $2^{30}=1,073,741,824$ regression models that we would need to evaluate combinatorically explodes.
2**30
1073741824
However, the benefit of conducting this exhaustive search of all possible models is that we know definitively which model will, say, have the highest adjusted R^2.
Heuristic Techniques
Rather exploring every possible $2^p$ regression models, we can instead use what we call heuristic algorithms which attempt to find a model that optimizes some metric (like the adjusted R^2) in an effecient amount of time.
Unfortunately, with a heuristic algorithm we cannnot guarantee that the model that is returned is the absolute best with respect to the desired metric, like the adjusted R^2. However, a heuristic algorithm is generally thought to be a good heuristic algorithm if it tends to satisfy the following two properties.
- It tends to return solutions (ie. models) that are considered "close enough" to the best possible solution (ie. model).
- It tends to return solutions in a "fast enough" amount of time.
In the sections below we will discuss three popular heuristic feature selection methods that seek to find the most parsimonious regression model out of a set of $p$ possible explanatory variables.
- Backwards Elimation Algoirthms
- Forward Selection Algorithms
- Regularization Techniques
- Principal Component Regression