To answer the given question, let’s first understand the difference between the Normal equation and Gradient descent method for linear regression:

###
**Gradient descent:**

- Needs hyper-parameter tuning for alpha (learning parameter).
- It is an iterative process.
- Time complexity- O(kn2)
- Preferred when n is extremely large.

###
**Normal Equation:**

- No such need for any hyperparameter.
- It is a non-iterative process.
- Time complexity- O(n3) due to evaluation of XTX.
- Becomes quite slow for large values of n.

**where,**

**‘k’** represents the maximum number of iterations used for the gradient descent algorithm, and

**‘n’** is the total number of observations present in the training dataset.

Clearly, if we have large training data, a normal equation is not preferred for use due to very high time complexity but for small values of ‘n’, the normal equation is faster than gradient descent.