Essence of Stacking Ensembles

The essence of stacking is about learning how to combine contributing ensemble members.

In this way, we might think of stacking as assuming that a simple “wisdom of crowds” (e.g. averaging) is good but not optimal and that better results can be achieved if we can identify and give more weight to experts in the crowd.

The experts and lesser experts are identified based on their skill in new situations, e.g. out-of-sample data. This is an important distinction from simple averaging and voting, although it introduces a level of complexity that makes the technique challenging to implement correctly and avoid data leakage, and in turn, incorrect and optimistic performance.

Nevertheless, we can see that stacking is a very general ensemble learning approach.

Broadly conceived, we might think of a weighted average of ensemble models as a generalization and improvement upon voting ensembles, and stacking as a further generalization of a weighted average model.

As such, the structure of the stacking procedure can be divided into three essential elements; they are:

  • Diverse Ensemble Members: Create a diverse set of models that make different predictions.
  • Member Assessment: Evaluate the performance of ensemble members.
  • Combine With Model: Use a model to combine predictions from members.

We can map canonical stacking onto these elements as follows:

  • Diverse Ensemble Members: Use different algorithms to fit each contributing model.
  • Member Assessment: Evaluate model performance on out-of-sample predictions.
  • Combine With Model: Machine learning model to combine predictions.

This provides a framework where we could consider related ensemble algorithms.