Why Mean Squared Error?

chayan-kathuria · 22 June 2021 08:15

In Machine Learning, our main goal is to minimize the error which is defined by the Loss Function. There are multiple ways to measure the loss. Focusing only on regression, MSE is one of the usual choice. But why?

Let’s consider the simplest loss functions and why they aren’t up to the mark.

Summation of Errors: The most basic loss function which is nothing but the sum of errors in each iteration. The error will be the difference in the predicted value and the actual value.
However, this will be a blunder when some of the errors are positive and some are negative. As they will get cancelled and error will be largely underestimated.

Solution? Absolute errors.

Taking the absolute values might solve our issue. However, the derivative of this loss function won’t exist at 0; which is needed to find the optimum point. And that won’t be possible here.

Solution? Squared errors.

Ideally, more the training points, lesser should be the loss. But with SSE, the total loss will keep getting added as the number of training points increase!

Solution? Mean Squared Errors!

With MSE, we take the average of all squared errors by dividing them by N. More the N, lesser would be the relative error. Problems solved!

#datascience #machinelearning