Detailed overview., how Forward Selection Technique works for feature selection?

Similar to backward elimination, even here we have a few steps to follow. We’ll go one by one as usual. But before going in, you need to know that this is going to be a bit more tedious of a job than backward elimination because you have to create a bunch of simple linear regression models here. And depending on the number of features you have in your dataset, the number of linear regression models you need to create could grow to a huge number pretty quickly. With that in mind, let’s get started.

Step 1

The first step is very similar to that of backward elimination. Here, we select a significance level, or a P-value. And as you already know, significance level of 5%, or a P-value of 0.05 is common. So let’s stick with that.

Step 2

This is a pretty tedious step. In this second step, we create a simple regression model for each feature we have in our dataset. So if there are 100 features, we create 100 simple linear regression models. So this could get a lot boring and complicated depending on the number of features in your dataset. But this is also one of the most import step in the process. And once we fit all the simple linear regression models, we calculate the P-value for all of them and identify the feature with the lowest P-value.

Step 3

In the previous step, we identified the feature with the lowest P-value. We’ll add that feature to the simple linear regression models of all other features. So in the second step, we had simple regression models with one feature each. In this step, we’ll have one less linear regression model, but each of them will have two features. Once we do this, we’ll fit the models again and calculate the P-values.

Step 4

In this step, we have the P-values of all the models we created in the previous step. We identify the feature with the lowest P-value again. We check if this lowest P-value is less than the significance level, or 0.05 in our example. If so, we’ll take that new feature and add it as a feature to all other models. So basically, we’re repeating step 3 with a new feature. We’ll continue this loop until the lowest P-value we get from a model is no longer less than the significance level. Once we reach this stage, we break the loop.

Once we break this loop, we’ll have the model we want, which is the model we created in the iteration before the iteration that broke the loop. Let me explain that. Suppose we had the loop running for 10 iterations. In the 10th iteration we found out that the lowest P-value is more than the significance level. We’ll consider the model before this model, which is the model from the 9th iteration. We don’t consider the last model because this has no significance, as the P-value was more than 0.05. I hope you understood that. :blush: