We can randomly remove the features and assess the accuracy of the algorithm iteratively but it is a very tedious and slow process.

There are essentially four common ways to reduce over-fitting.

## 1. Reduce Features:

The most obvious option is to reduce the features. You can compute the correlation matrix of the features and reduce the features that are highly correlated with each other:

import matplotlib.pyplot as plt

plt.matshow(dataframe.corr())

plt.show()

## 2. Model Selection Algorithms:

You can select model selection algorithms. These algorithms can choose features with greater importance.

The problem with these techniques is that we might end up losing valuable information.

## 3. Feed More Data

You should aim to feed enough data to your models so that the models are trained, tested and validated thoroughly. Aim to give 60% of the data to train the model, 20% of the data to test and 20% of the data to validate the model.

## 3. Regularization:

The aim of regularization is to keep all of the features but impose a constraint on the magnitude of the coefficients.

It is preferred because you do not have to lose the features by penalising the features. When the constraints are applied to the parameters, then the model is less prone to over-fitting as it produces a smooth function.

The regularization parameters, known as penalty factors, are introduced which control the parameters and ensure that the model is not over-training itself on the training data.

These parameters are set to smaller values to eliminate overfitting. When the coefficients take large values then the regularization parameters penalise the optimisation function.

There are two common regularization techniques:

**LASSO**

Lasso is a feature selection tool and it can completely eliminate non-important features. Adds a penalty which is the absolute of the magnitude of the coefficients. This ensures that the features do not end up applying high weight on the prediction of the algorithm. As a result, some of the weights will end up being to zero. This means that the data of some of the features will not be used in the algorithm.

**from** **sklearn** **import** linear_model

**model** = linear_model.Lasso(alpha=0.1)

model.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])

**2. RIDGE**

Adds a penalty which is the square of the magnitude of the coefficients. As a result, some of the weights will be very close to 0. As a result, it ends up smoothing the effect of the features.

**from** **sklearn.linear_model** **import** Ridge

**model** = Ridge(alpha=1.0)

**model** .fit(X, y)