Blending Ensemble

Blending is explicitly a stacked generalization model with a specific configuration.

A limitation of stacking is that there is no generally accepted configuration. This can make the method challenging for beginners as essentially any models can be used as the base-models and meta-model, and any resampling method can be used to prepare the training dataset for the meta-model.

Blending is a specific stacking ensemble that makes two prescriptions.

The first is to use a holdout validation dataset to prepare the out-of-sample predictions used to train the meta-model. The second is to use a linear model as the meta-model.

The technique was born out of the requirements of practitioners working on machine learning competitions that involves the development of a very large number of base learner models, perhaps from different sources (or teams of people), that in turn may be too computationally expensive and too challenging to coordinate to validate using the k-fold cross-validation partitions of the dataset.

  • Member Predictions: Out-of-sample predictions on a validation dataset.
  • Combine With Model: Linear model (e.g. linear regression or logistic regression).

Given the popularity of blending ensembles, stacking has sometimes come to specifically refer to the use of k-fold cross-validation to prepare out of sample predictions for the meta-model.