When should it be preferred to the Gradient Descent method instead of the Normal Equation in Linear Regression Algorithm?

To answer the given question, let’s first understand the difference between the Normal equation and Gradient descent method for linear regression:

Gradient descent:

  • Needs hyper-parameter tuning for alpha (learning parameter).
  • It is an iterative process.
  • Time complexity- O(kn2)
  • Preferred when n is extremely large.

Normal Equation:

  • No such need for any hyperparameter.
  • It is a non-iterative process.
  • Time complexity- O(n3) due to evaluation of XTX.
  • Becomes quite slow for large values of n.


‘k’ represents the maximum number of iterations used for the gradient descent algorithm, and

‘n’ is the total number of observations present in the training dataset.

Clearly, if we have large training data, a normal equation is not preferred for use due to very high time complexity but for small values of ‘n’, the normal equation is faster than gradient descent.