Explain Class Imabalance Techniques in detail?

Data are said to suffer the Class Imbalance Problem when the class distributions are highly imbalanced. In this context, many classification learning algorithms have low predictive accuracy for the infrequent class. Cost-sensitive learning is a common approach to solving this problem.

1| Oversampling

This technique is used to modify the unequal data classes to create balanced datasets. When the quantity of data is insufficient, the oversampling method tries to balance by incrementing the size of rare samples.

Advantages

  • No loss of information
  • Mitigate overfitting caused by oversampling.

2| Undersampling

Unlike oversampling, this technique balances the imbalance dataset by reducing the size of the class which is in abundance. There are various methods for classification problems such as cluster centroids and Tomek links. The cluster centroid methods replace the cluster of samples by the cluster centroid of a K-means algorithm and the Tomek link method removes unwanted overlap between classes until all minimally distanced nearest neighbours are of the same class.

Advantages

  • Run-time can be improved by decreasing the amount of training dataset.
  • Helps in solving the memory problems

3| Cost-Sensitive Learning Technique

Cost-Sensitive Learning (CSL) takes the misclassification costs into consideration by minimizing the total cost. The goal of this technique is mainly to pursue a high accuracy of classifying examples into a set of known classes. It is playing as one of an important role in machine learning algorithms including real-world data mining applications.

Advantages

  • This technique avoids pre-selection of parameters and auto-adjust the decision hyperplane.

4| Ensemble Learning Techniques

The ensemble-based method is another technique which is used to deal with imbalanced data sets, and the ensemble technique is combined the result or performance of several classifiers to improve the performance of single classifier.

Advantages

  • This is a more stable model
  • The prediction is better

5| Combined Class Methods

In this type of method, various methods are fused together to get a better result to handle imbalanced data. For instance, like SMOTE can be fused with other methods like MSMOTE (Modified SMOTE), SMOTEENN (SMOTE with Edited Nearest Neighbors), SMOTE-TL, SMOTE-EL, etc. to eliminate noise in the imbalanced data sets.

Advantages

  • No loss of useful information
  • Good generalization