Explain the concept of Residuals in Machine Learning?

Residuals in a statistical or machine learning model are the differences between observed and predicted values of data. They are a diagnostic measure used when assessing the quality of a model. They are also known as errors.

Why are residuals important?

  • Residuals are important when determining the quality of a model. You can examine residuals in terms of their magnitude and/or whether they form a pattern.
  • Where the residuals are all 0, the model predicts perfectly. The further residuals are from 0, the less accurate the model is. In the case of linear regression, the greater the sum of squared residuals, the smaller the R-squared statistic, all else being equal.
  • Where the average residual is not 0, it implies that the model is systematically biased (i.e., consistently over- or under-predicting).
  • Where residuals contain patterns, it implies that the model is qualitatively wrong, as it is failing to explain some properties of the data. The existence of patterns invalidates most statistical tests.