When we use the normal equation then why don’t we need feature scaling?

Scaling

Feature scaling makes the optimization better conditioned and leads to faster and better convergence. So, in theory, when you are using normal equations, you don’t use numerical methods, like gradient descent. Therefore, feature scaling is not necessary.