Converting a regression problem to classification

rajanikant-ghate · 28 June 2021 05:49

Regression models can be very sensitive to outliers. Also, a practical challenge is, a predicted value might be far off from the real value in extreme ranges, however it still may fall in the correct side of the data distribution with respect to the mean. For e.g. you have a device that measures heart rate from your finger tips images (just cooking up an example here), it’ll be lot easier from data science standpoint to first try to predict whether the pulse rate is normal or high or below normal.

The definitions of what is “normal” is very clear from medical standpoint. However, you may wish to redefine your response variable segments, first using a clustering technique.

chirag-garg · 23 August 2021 04:00

Some regression models are already classification models - e.g. logistic regression. One could set the cutpoint at any particular level to get a classification. Usually one would choose a 50-50 split, but there may be reasons not to (the cost of the two types of classification error might be different).

Regression trees turn into classification trees if the dependent variable changes. In general, it is not a good idea to turn a continuous dependent variable (as for regression trees) into a categorical one - it loses information. But there might be times when it is necessary (e.g. to make certain kinds of decisions).

Similarly, if you cateogorize the dependent variable, a linear regression is inappropriate and a logistic regression model is better.