What is an Outlier treatment?

An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal.

Impact of outliers:

Machine learning algorithms are sensitive to the range and distribution of attribute values. Data outliers can spoil and mislead the training process resulting in longer training times, less accurate models and ultimately poorer results.

  • Univariate search method: This method searches one dimension at a time thus optimizing only a single variable per iteration. As can be seen from the plot; the search begins from an initial guess i.e. (1,-5) in the example and achieves the minimum by optimizing the function in x and y dimensions intermittently till the net optimum is achieved.

  • Multivariate Search Method: Multivariate statistical methods are used to analyze the joint behavior of more than one random variable. There are a wide range of multivariate techniques available.

  • Minkowski Error: The Minkowski error is a loss index that is more insensitive to outliers than the standard mean squared error. The mean squared error raises each instance error to the square, making a huge contribution of outliers to the total error.