he basic assumptions of the Linear regression algorithm are as follows:
- Linearity: The relationship between the features and target.
- Homoscedasticity: The error term has a constant variance.
- Multicollinearity: There is no multicollinearity between the features.
- Independence: Observations are independent of each other.
- Normality: The error(residuals) follows a normal distribution.
Now, let’s break these assumptions into different categories:
Assumptions about the form of the model:
It is assumed that there exists a linear relationship between the dependent and the independent variables. Sometimes, this assumption is known as the ‘linearity assumption’.
Assumptions about the residuals:
- Normality assumption: The error terms, ε(i), are normally distributed.
- Zero mean assumption: The residuals have a mean value of zero.
- Constant variance assumption: The residual terms have the same (but unknown) value of variance, σ2. This assumption is also called the assumption of homogeneity or homoscedasticity.
- Independent error assumption: The residual terms are independent of each other, i.e. their pair-wise covariance value is zero.
Assumptions about the estimators:
- The independent variables are measured without error.
- There does not exist a linear dependency between the independent variables, i.e. there is no multicollinearity in the data.