Why is multicollinearity bad?


Multicollinearity exists whenever an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation. Multicollinearity is a problem because it undermines the statistical significance of an independent variable. Other things being equal, the larger the standard error of a regression coefficient, the less likely it is that this coefficient will be statistically significant.

Multi-collinearity means that several variables are essentially measuring the same thing. There is no point to having more than one measure of the same thing in a model.

So logically, you don’t want more than one measure of the same thing. It doesn’t add to the predictive capability of the model and it may make the model fit less well. Instead, you want measures of significantly different things in order to build your model and to make it as accurate as it can become.

It is not correct to say that multi-collinearity is bad. It is neutral. It is something you need to be aware of so that you can adjust for it and so that you can build the best predictive model to help answer your research question.