Heteroscedasticity💡

With reference to Linear Regression, Heteroscedasticity simply means that the residuals of the observations do not posses same variances.:bulb:

This would mean that the observations are actually from different probability distributions with different variances. And this defies one of the assumptions of Linear Regression.:negative_squared_cross_mark:

In essence, the observations with low variances will conform to the underlying relationship in a better way than the observations with high variance. So, the ones with high variances would imply higher standard error, meaning that your results would be invalid.:interrobang:

Quickest way to check for Heteroscedasticity would be to plot residuals against the predictions and see for any pattern. If a pattern exists, there might be Heteroscedasticity present.:chart_with_upwards_trend:

Fixing Heteroscedasticity would depend on the kind of data and problem you have. However, one way could be to transform those features to measure a slightly different value, like the percentages & rates.:hammer_and_wrench:

Another less intuitive way could be to use weighted regression, meaning that you assign higher weights to the observations with low variance and lower weights to the observations with high variance for better estimation.

#datascience #machinelearning #statistics