Probabilities summarize the likelihood of an event as a numerical value between 0.0 and 1.0.
When predicted for class membership, it involves a probability assigned for each class, together summing to the value 1.0; for example, a model may predict:
- Red: 0.75
- Green: 0.10
- Blue: 0.15
We can see that class “red” has the highest probability or is the most likely outcome predicted by the model and that the distribution of probabilities across the classes (0.75 + 0.10 + 0.15) sum to 1.0.
The way that the probabilities are combined depends on the outcome that is required.
For example, if probabilities are required, then the independent predicted probabilities can be combined directly.
Perhaps the simplest approach for combining probabilities is to sum the probabilities for each class and pass the predicted values through a softmax function. This ensures that the scores are appropriately normalized, meaning the probabilities across the class labels sum to 1.0.
… such outputs – upon proper normalization (such as softmax normalization […]) – can be interpreted as the degree of support given to that class
— Page 8, Ensemble Machine Learning, 2012.
More commonly we wish to predict a class label from predicted probabilities.
The most common approach is to use voting, where the predicted probabilities represent the vote made by each model for each class. Votes are then summed and a voting method from the previous section can be used, such as selecting the label with the largest summed probabilities or the largest mean probability.
- Vote Using Mean Probabilities
- Vote Using Sum Probabilities
- Vote Using Weighted Sum Probabilities
Generally, this approach to treating probabilities as votes for choosing a class label is referred to as soft voting.