Classification Predictive Modeling in machine learning

Classification predictive modeling is the task of approximating a mapping function (f) from input variables (X) to discrete output variables (y).

The output variables are often called labels or categories. The mapping function predicts the class or category for a given observation.

For example, an email of text can be classified as belonging to one of two classes: “spam and “ not spam “.

  • A classification problem requires that examples be classified into one of two or more classes.
  • A classification can have real-valued or discrete input variables.
  • A problem with two classes is often called a two-class or binary classification problem.
  • A problem with more than two classes is often called a multi-class classification problem.
  • A problem where an example is assigned multiple classes is called a multi-label classification problem.

It is common for classification models to predict a continuous value as the probability of a given example belonging to each output class. The probabilities can be interpreted as the likelihood or confidence of a given example belonging to each class. A predicted probability can be converted into a class value by selecting the class label that has the highest probability.

For example, a specific email of text may be assigned the probabilities of 0.1 as being “spam” and 0.9 as being “not spam”. We can convert these probabilities to a class label by selecting the “not spam” label as it has the highest predicted likelihood.

There are many ways to estimate the skill of a classification predictive model, but perhaps the most common is to calculate the classification accuracy.

The classification accuracy is the percentage of correctly classified examples out of all predictions made.

For example, if a classification predictive model made 5 predictions and 3 of them were correct and 2 of them were incorrect, then the classification accuracy of the model based on just these predictions would be:

accuracy = correct predictions / total predictions * 100
accuracy = 3 / 5 * 100
accuracy = 60%

An algorithm that is capable of learning a classification predictive model is called a classification algorithm.