Weighted Average Ensemble

Weighted average or weighted sum ensemble is an ensemble machine learning approach that combines the predictions from multiple models, where the contribution of each model is weighted proportionally to its capability or skill.

The weighted average ensemble is related to the voting ensemble.

Voting ensembles are composed of multiple machine learning models where the predictions from each model are averaged directly. For regression, this involves calculating the arithmetic mean of the predictions made by ensemble members. For classification, this may involve calculating the statistical mode (most common class label) or similar voting scheme or summing the probabilities predicted for each class and selecting the class with the largest summed probability.

A limitation of the voting ensemble technique is that it assumes that all models in the ensemble are equally effective. This may not be the case as some models may be better than others, especially if different machine learning algorithms are used to train each model ensemble member.

An alternative to voting is to assume that ensemble members are not all equally capable and instead some models are better than others and should be given more votes or more of a seat when making a prediction. This provides the motivation for the weighted sum or weighted average ensemble method.

In regression, an average prediction is calculated using the arithmetic mean, such as the sum of the predictions divided by the total predictions made. For example, if an ensemble had three ensemble members, the reductions may be:

  • Model 1 : 97.2
  • Model 2 : 100.0
  • Model 3 : 95.8

The mean prediction would be calculated as follows:

  • yhat = (97.2 + 100.0 + 95.8) / 3
  • yhat = 293 / 3
  • yhat = 97.666

A weighted average prediction involves first assigning a fixed weight coefficient to each ensemble member. This could be a floating-point value between 0 and 1, representing a percentage of the weight. It could also be an integer starting at 1, representing the number of votes to give each model.

For example, we may have the fixed weights of 0.84, 0.87, 0.75 for the ensemble member. These weights can be used to calculate the weighted average by multiplying each prediction by the model’s weight to give a weighted sum, then dividing the value by the sum of the weights. For example:

  • yhat = ((97.2 * 0.84) + (100.0 * 0.87) + (95.8 * 0.75)) / (0.84 + 0.87 + 0.75)
  • yhat = (81.648 + 87 + 71.85) / (0.84 + 0.87 + 0.75)
  • yhat = 240.498 / 2.46
  • yhat = 97.763

We can see that as long as the scores have the same scale, and the weights have the same scale and are maximizing (meaning that larger weights are better), the weighted sum results in a sensible value, and in turn, the weighted average is also sensible, meaning the scale of the outcome matches the scale of the scores.

This same approach can be used to calculate the weighted sum of votes for each crisp class label or the weighted sum of probabilities for each class label on a classification problem.

The challenging aspect of using a weighted average ensemble is how to choose the relative weighting for each ensemble member.

There are many approaches that can be used. For example, the weights may be chosen based on the skill of each model, such as the classification accuracy or negative error, where large weights mean a better-performing model. Performance may be calculated on the dataset used for training or a holdout dataset, the latter of which may be more relevant.

The scores of each model can be used directly or converted into a different value, such as the relative ranking for each model. Another approach might be to use a search algorithm to test different combinations of weights.