When we use the normal equation then why don’t we need feature scaling?


Feature scaling makes the optimization better conditioned and leads to faster and better convergence. So, in theory, when you are using normal equations, you don’t use numerical methods, like gradient descent. Therefore, feature scaling is not necessary.