# What is the difference between squared error and absolute error?

What is the difference between squared error and absolute error?

Say you define your error as,
PredictedValue−ActualValuePredictedValue−ActualValue.

Then the error in estimation can be of two kinds,

1. You underestimate the value, in which case your error will be negative.
2. You overestimate the value, in which case your error will be positive.

When you average these out, you might get a very low error if you are underestimating and overestimating equally as they will cancel each other out. To get rid of the effect of the negative value while taking the mean, we square them.

A better question would be why not use the absolute difference instead of squaring the errors. This has no definite answer as it is very application specific.

In cases where you want to emphasize the spread of your errors, basically you want to penalize the errors that are farther away from the mean (usually 0 in machine learning, a user parameter in statistics). In small scales where your errors are less than 1 because the values themselves are small, taking just the absolute might not give the best feedback mechanism to the algorithm.

Though the above statement cannot be used everywhere, you have to carefully consider your problem and then decide. Sometimes you want your error to be in the same units as your data. In which case, you individually square the error for each observation and take the square root of the mean. This lets you factor for more spread as well as keeping the units constant.

TL;DR: Squared for getting rid of the negative errors affecting the mean. Both absolute values and squared values are used based on the use-case.

The squared difference has nicer mathematical properties; it’s continuously differentiable (nice when you want to minimize it), it’s a sufficient statistic for the Gaussian distribution, and it’s (a version of) the L2 norm which comes in handy for proving convergence and so on.

The mean absolute deviation (the absolute value notation you suggest) is also used as a measure of dispersion, but it’s not as “well-behaved” as the squared error.