Though the ROC is famous, it doesn’t bring out the true baseline for an imbalanced class problem. It’s simply because accuracy is not a good metric, while precision and recall are better in that case. Similar to AUC ROC curve, AUC Prec Recall curve is the metric to compare models in this case. The optimal threshold in this case, on the other hand will correspon to right top corner of the graph instead of the top left for an AUC ROC.
it can be more flexible to predict probabilities of an observation belonging to each class in a classification problem rather than predicting classes directly.
This flexibility comes from the way that probabilities may be interpreted using different thresholds that allow the operator of the model to trade-off concerns in the errors made by the model, such as the number of false positives compared to the number of false negatives. This is required when using models where the cost of one error outweighs the cost of other types of errors.
Two diagnostic tools that help in the interpretation of probabilistic forecast for binary (two-class) classification predictive modeling problems are ROC Curves and Precision-Recall curves.
- ROC Curves summarize the trade-off between the true positive rate and false positive rate for a predictive model using different probability thresholds.
- Precision-Recall curves summarize the trade-off between the true positive rate and the positive predictive value for a predictive model using different probability thresholds.
- ROC curves are appropriate when the observations are balanced between each class, whereas precision-recall curves are appropriate for imbalanced datasets.