Precision and Recall vs Sensitivity and Specificity

When we need to express model performance in two numbers, an alternative two-number metric to precision and recall is sensitivity and specificity. This is commonly used for medical devices, such as virus testing kits and pregnancy tests. You can often find the manufacturer’s stated sensitivity and specificity for a device or testing kit printed on the side of the box, or in the instruction leaflet.

Sensitivity and specificity are defined as follows. Note that sensitivity is equivalent to recall:

Specificity also uses tn, the number of true negatives. This means that sensitivity and specificity use all four numbers in the confusion matrix, as opposed to precision and recall which only use three.

The number of true negatives corresponds to the number of patients identified by the test as having the disease when they did not have the disease, or alternativelythe number of irrelevant documents which the search engine did not retrieve.

Taking a probabilistic interpretation, we can view specificity as the probability of a negative test given that the patient is well, while the sensitivity is the probability of a positive test given that the patient has the disease.

Sensitivity and specificity are preferred to precision and recall in the medical domain, while precision and recall are the most commonly used metrics for information retrieval. This initially seems strange, since both pairs of metrics are measuring the same thing: the performance of a binary classifier.

The reason for this discrepancy is that when we are measuring the performance of a search engine, we only care about the returned results, so both precision and recall are measured in terms of the true and false positives. However, if we are testing a medical device, it is important to take into account the number of true negatives, since these represent the large number of patients who do not have the disease and were correctly categorized by the device.