What is the difference between Underfitting and Overfitting?

board-infinity · 6 October 2022 06:04

Underfitting:

A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data. Underfitting destroys the accuracy of our machine learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough. It usually happens when we have fewer data to build an accurate model and also when we try to build a linear model with fewer non-linear data.

In such cases, the rules of the machine learning model are too easy and flexible to be applied to such minimal data and therefore the model will probably make a lot of wrong predictions.
Techniques to reduce underfitting:

Increase model complexity
Increase the number of features, performing feature engineering
Remove noise from the data.
Increase the number of epochs or increase the duration of training to get better results.

Overfitting:

A statistical model is said to be overfitted when we train it with a lot of data. When a model gets trained with so much data, it starts learning from the noise and inaccurate data entries in our data set. Then the model does not categorize the data correctly, because of too many details and noise. The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.

Techniques to reduce overfitting:

Increase training data.
Reduce model complexity.
Early stopping during the training phase (have an eye over the loss over the training period as soon as loss begins to increase stop training).
Ridge Regularization and Lasso Regularization
Use dropout for neural networks to tackle overfitting.