Ensemble Pruning Example

In this section, we will explore how to develop a greedy ensemble pruning algorithm from scratch.

We will use a greedy algorithm in this case, which is straightforward to implement. This involves removing one member from the ensemble and evaluating the performance and repeating this for each member in the ensemble. The member that, if removed, results in the best improvement in performance is then permanently removed from the ensemble and the process repeats. This continues until no further improvements can be achieved.

It is referred to as a “greedy” algorithm because it seeks the best improvement at each step. It is possible that the best combination of members is not on the path of greedy improvements, in which case the greedy algorithm will not find it and a global optimization algorithm could be used instead.

First, we can define a function to evaluate a candidate list of models. This function will take the list of models and the dataset and construct a voting ensemble from the list of models and evaluate its performance using repeated stratified k-fold cross-validation, returning the mean classification accuracy.

This function can be used to evaluate each candidate’s removal from the ensemble. The evaluate_ensemble() function below implements this.

evaluate a list of models

def evaluate_ensemble(models, X, y):
# check for no models
if len(models) == 0:
return 0.0
# create the ensemble
ensemble = VotingClassifier(estimators=models, voting=‘soft’)
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the ensemble
scores = cross_val_score(ensemble, X, y, scoring=‘accuracy’, cv=cv, n_jobs=-1)
# return mean score
return mean(scores)
1
2
3
4
5
6
7
8
9
10
11
12
13

evaluate a list of models

def evaluate_ensemble(models, X, y):
# check for no models
if len(models) == 0:
return 0.0
# create the ensemble
ensemble = VotingClassifier(estimators=models, voting=‘soft’)
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the ensemble
scores = cross_val_score(ensemble, X, y, scoring=‘accuracy’, cv=cv, n_jobs=-1)
# return mean score
return mean(scores)
Next, we can define a function that performs a single round of pruning.

First, a baseline in performance is established with all models that are currently in the ensemble. Then the list of models is enumerated and each is removed in turn, and the effect of removing the model is evaluated on the ensemble. If the removal results in an improvement in performance, the new score and specific model that was removed is recorded.

Importantly, the trial removal is performed on a copy of the list of models, not on the main list of models itself. This is to ensure we only remove an ensemble member from the list once we know it will result in the best possible improvement from all the members that could potentially be removed at the current step.

The prune_round() function below implements this given the list of current models in the ensemble and dataset, and returns the improvement in score (if any) and the best model to remove to achieve that score.

perform a single round of pruning the ensemble

def prune_round(models_in, X, y):
# establish a baseline
baseline = evaluate_ensemble(models_in, X, y)
best_score, removed = baseline, None
# enumerate removing each candidate and see if we can improve performance
for m in models_in:
# copy the list of chosen models
dup = models_in.copy()
# remove this model
dup.remove(m)
# evaluate new ensemble
result = evaluate_ensemble(dup, X, y)
# check for new best
if result > best_score:
# store the new best
best_score, removed = result, m
return best_score, removed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

perform a single round of pruning the ensemble

def prune_round(models_in, X, y):
# establish a baseline
baseline = evaluate_ensemble(models_in, X, y)
best_score, removed = baseline, None
# enumerate removing each candidate and see if we can improve performance
for m in models_in:
# copy the list of chosen models
dup = models_in.copy()
# remove this model
dup.remove(m)
# evaluate new ensemble
result = evaluate_ensemble(dup, X, y)
# check for new best
if result > best_score:
# store the new best
best_score, removed = result, m
return best_score, removed
Next, we need to drive the pruning process.

This involves running a round of pruning until no further improvement in accuracy is achieved by calling the prune_round() function repeatedly.

If the function returns None for the model to be removed, we know that no single greedy improvement is possible and we can return the final list of models. Otherwise, the chosen model is removed from the ensemble and the process continues.

The prune_ensemble() function below implements this and returns the models to use in the final ensemble and the score that it achieved via our evaluation procedure.

prune an ensemble from scratch

def prune_ensemble(models, X, y):
best_score = 0.0
# prune ensemble until no further improvement
while True:
# remove one model to the ensemble
score, removed = prune_round(models, X, y)
# check for no improvement
if removed is None:
print(‘>no further improvement’)
break
# keep track of best score
best_score = score
# remove model from the list
models.remove(removed)
# report results along the way
print(‘>%.3f (removed: %s)’ % (score, removed[0]))
return best_score, models
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

prune an ensemble from scratch

def prune_ensemble(models, X, y):
best_score = 0.0
# prune ensemble until no further improvement
while True:
# remove one model to the ensemble
score, removed = prune_round(models, X, y)
# check for no improvement
if removed is None:
print(‘>no further improvement’)
break
# keep track of best score
best_score = score
# remove model from the list
models.remove(removed)
# report results along the way
print(‘>%.3f (removed: %s)’ % (score, removed[0]))
return best_score, models
We can tie all of this together into an example of ensemble pruning on the synthetic binary classification dataset.

example of ensemble pruning for classification

from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import VotingClassifier
from matplotlib import pyplot

get the dataset

def get_dataset():
X, y = make_classification(n_samples=5000, n_features=20, n_informative=10, n_redundant=10, random_state=1)
return X, y

get a list of models to evaluate

def get_models():
models = list()
models.append((‘lr’, LogisticRegression()))
models.append((‘knn’, KNeighborsClassifier()))
models.append((‘tree’, DecisionTreeClassifier()))
models.append((‘nb’, GaussianNB()))
models.append((‘svm’, SVC(probability=True)))
return models

evaluate a list of models

def evaluate_ensemble(models, X, y):
# check for no models
if len(models) == 0:
return 0.0
# create the ensemble
ensemble = VotingClassifier(estimators=models, voting=‘soft’)
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the ensemble
scores = cross_val_score(ensemble, X, y, scoring=‘accuracy’, cv=cv, n_jobs=-1)
# return mean score
return mean(scores)

perform a single round of pruning the ensemble

def prune_round(models_in, X, y):
# establish a baseline
baseline = evaluate_ensemble(models_in, X, y)
best_score, removed = baseline, None
# enumerate removing each candidate and see if we can improve performance
for m in models_in:
# copy the list of chosen models
dup = models_in.copy()
# remove this model
dup.remove(m)
# evaluate new ensemble
result = evaluate_ensemble(dup, X, y)
# check for new best
if result > best_score:
# store the new best
best_score, removed = result, m
return best_score, removed

prune an ensemble from scratch

def prune_ensemble(models, X, y):
best_score = 0.0
# prune ensemble until no further improvement
while True:
# remove one model to the ensemble
score, removed = prune_round(models, X, y)
# check for no improvement
if removed is None:
print(‘>no further improvement’)
break
# keep track of best score
best_score = score
# remove model from the list
models.remove(removed)
# report results along the way
print(‘>%.3f (removed: %s)’ % (score, removed[0]))
return best_score, models

define dataset

X, y = get_dataset()

get the models to evaluate

models = get_models()

prune the ensemble

score, model_list = prune_ensemble(models, X, y)
names = ‘,’.join([n for n,_ in model_list])
print(‘Models: %s’ % names)
print(‘Final Mean Accuracy: %.3f’ % score)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90

example of ensemble pruning for classification

from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import VotingClassifier
from matplotlib import pyplot

get the dataset

def get_dataset():
X, y = make_classification(n_samples=5000, n_features=20, n_informative=10, n_redundant=10, random_state=1)
return X, y

get a list of models to evaluate

def get_models():
models = list()
models.append((‘lr’, LogisticRegression()))
models.append((‘knn’, KNeighborsClassifier()))
models.append((‘tree’, DecisionTreeClassifier()))
models.append((‘nb’, GaussianNB()))
models.append((‘svm’, SVC(probability=True)))
return models

evaluate a list of models

def evaluate_ensemble(models, X, y):
# check for no models
if len(models) == 0:
return 0.0
# create the ensemble
ensemble = VotingClassifier(estimators=models, voting=‘soft’)
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# evaluate the ensemble
scores = cross_val_score(ensemble, X, y, scoring=‘accuracy’, cv=cv, n_jobs=-1)
# return mean score
return mean(scores)

perform a single round of pruning the ensemble

def prune_round(models_in, X, y):
# establish a baseline
baseline = evaluate_ensemble(models_in, X, y)
best_score, removed = baseline, None
# enumerate removing each candidate and see if we can improve performance
for m in models_in:
# copy the list of chosen models
dup = models_in.copy()
# remove this model
dup.remove(m)
# evaluate new ensemble
result = evaluate_ensemble(dup, X, y)
# check for new best
if result > best_score:
# store the new best
best_score, removed = result, m
return best_score, removed

prune an ensemble from scratch

def prune_ensemble(models, X, y):
best_score = 0.0
# prune ensemble until no further improvement
while True:
# remove one model to the ensemble
score, removed = prune_round(models, X, y)
# check for no improvement
if removed is None:
print(‘>no further improvement’)
break
# keep track of best score
best_score = score
# remove model from the list
models.remove(removed)
# report results along the way
print(‘>%.3f (removed: %s)’ % (score, removed[0]))
return best_score, models

define dataset

X, y = get_dataset()

get the models to evaluate

models = get_models()

prune the ensemble

score, model_list = prune_ensemble(models, X, y)
names = ‘,’.join([n for n,_ in model_list])
print(‘Models: %s’ % names)
print(‘Final Mean Accuracy: %.3f’ % score)
Running the example performs the ensemble pruning process, reporting which model was removed each round and the accuracy of the model once the model was removed.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that three rounds of pruning were performed, removing the naive Bayes, decision tree, and logistic regression algorithms, leaving only the SVM and KNN algorithms that achieved a mean classification accuracy of about 95.7 percent. This is better than the 95.3 percent achieved by SVM and KNN used in a standalone manner, and clearly better than combining all models together.

The final list of models could then be used in a new final voting ensemble model via the “estimators” argument, fit on the entire dataset and used to make a prediction on new data.

0.939 (removed: nb)
0.948 (removed: tree)
0.957 (removed: lr)
no further improvement
Models: knn,svm
Final Mean Accuracy: 0.957
1
2
3
4
5
6
0.939 (removed: nb)
0.948 (removed: tree)
0.957 (removed: lr)
no further improvement
Models: knn,svm
Final Mean Accuracy: 0.957