Multicollinearity🔍

Multicollinearity occurs when multiple features in a regression model are correlated or dependent on each other to some extent. Change in the value of one feature will also force change the value of features collinear to it. In other words, such features add no more information to the model.:repeat_one:

They, in fact, can lead to Overfitting as it might give unpredictable results on unseen data. This is turn means a high Standard Error and low Statistical Power.:bulb:

To measure Multicollinearity, the 2 most common techniques are - Correlation Matrix and Variance Inflation Factor(VIF). Correlation Matrix just contains the correlation values of each feature with every other feature. Extreme values signify high correlation. :1234:

VIF is another method to quantify correlation, with value of 1 meaning no Collinearity and >5 meaning high collinearity.

Multicollinear variables with correlation more than a threshold are usually dropped from the dataset. This reduces the dimensions and makes the model less complex. There are many more techniques to deal with Multicollinearity such as Linear combination of the features, PCA, etc.:hammer_and_wrench:

#machinelearning #datascience