Hyperparameter Tuning for KNORA-2

swapneel-panda-419bc751 · 15 June 2021 07:56

Explore Algorithms for Classifier Pool
The choice of algorithms used in the pool for the KNORA is another important hyperparameter.

By default, bagged decision trees are used, as it has proven to be an effective approach on a range of classification tasks. Nevertheless, a custom pool of classifiers can be considered.

In the majority of DS publications, the pool of classifiers is generated using either well known ensemble generation methods such as Bagging, or by using heterogeneous classifiers.

— Dynamic Classifier Selection: Recent Advances And Perspectives, 2018.

This requires first defining a list of classifier models to use and fitting each on the training dataset. Unfortunately, this means that the automatic k-fold cross-validation model evaluation methods in scikit-learn cannot be used in this case. Instead, we will use a train-test split so that we can fit the classifier pool manually on the training dataset.

The list of fit classifiers can then be specified to the KNORA-Union (or KNORA-Eliminate) class via the “pool_classifiers” argument. In this case, we will use a pool that includes logistic regression, a decision tree, and a naive Bayes classifier.

The complete example of evaluating the KNORA ensemble and a custom set of classifiers on the synthetic dataset is listed below.

evaluate KNORA-U dynamic ensemble selection with a custom pool of algorithms

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deslib.des.knora_u import KNORAU
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

define classifiers to use in the pool

classifiers = [
LogisticRegression(),
DecisionTreeClassifier(),
GaussianNB()]

fit each classifier on the training set

for c in classifiers:
c.fit(X_train, y_train)

define the KNORA-U model

model = KNORAU(pool_classifiers=classifiers)

fit the model

model.fit(X_train, y_train)

make predictions on the test set

yhat = model.predict(X_test)

evaluate predictions

score = accuracy_score(y_test, yhat)
print(‘Accuracy: %.3f’ % (score))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

evaluate KNORA-U dynamic ensemble selection with a custom pool of algorithms

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deslib.des.knora_u import KNORAU
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

define classifiers to use in the pool

classifiers = [
LogisticRegression(),
DecisionTreeClassifier(),
GaussianNB()]

fit each classifier on the training set

for c in classifiers:
c.fit(X_train, y_train)

define the KNORA-U model

model = KNORAU(pool_classifiers=classifiers)

fit the model

model.fit(X_train, y_train)

make predictions on the test set

yhat = model.predict(X_test)

evaluate predictions

score = accuracy_score(y_test, yhat)
print(‘Accuracy: %.3f’ % (score))
Running the example first reports the mean accuracy for the model with the custom pool of classifiers.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model achieved an accuracy of about 91.3 percent.

Accuracy: 0.913
1
Accuracy: 0.913
In order to adopt the KNORA model, it must perform better than any contributing model. Otherwise, we would simply use the contributing model that performs better.

We can check this by evaluating the performance of each contributing classifier on the test set.

…

evaluate contributing models

for c in classifiers:
yhat = c.predict(X_test)
score = accuracy_score(y_test, yhat)
print(‘>%s: %.3f’ % (c.class.name, score))
1
2
3
4
5
6
…

evaluate contributing models

for c in classifiers:
yhat = c.predict(X_test)
score = accuracy_score(y_test, yhat)
print(‘>%s: %.3f’ % (c.class.name, score))
The updated example of KNORA with a custom pool of classifiers that are also evaluated independently is listed below.

evaluate KNORA-U dynamic ensemble selection with a custom pool of algorithms

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deslib.des.knora_u import KNORAU
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

define classifiers to use in the pool

classifiers = [
LogisticRegression(),
DecisionTreeClassifier(),
GaussianNB()]

fit each classifier on the training set

for c in classifiers:
c.fit(X_train, y_train)

define the KNORA-U model

model = KNORAU(pool_classifiers=classifiers)

fit the model

model.fit(X_train, y_train)

make predictions on the test set

yhat = model.predict(X_test)

evaluate predictions

score = accuracy_score(y_test, yhat)
print(‘Accuracy: %.3f’ % (score))

evaluate contributing models

for c in classifiers:
yhat = c.predict(X_test)
score = accuracy_score(y_test, yhat)
print(‘>%s: %.3f’ % (c.class.name, score))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

evaluate KNORA-U dynamic ensemble selection with a custom pool of algorithms

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deslib.des.knora_u import KNORAU
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

define classifiers to use in the pool

classifiers = [
LogisticRegression(),
DecisionTreeClassifier(),
GaussianNB()]

fit each classifier on the training set

for c in classifiers:
c.fit(X_train, y_train)

define the KNORA-U model

model = KNORAU(pool_classifiers=classifiers)

fit the model

model.fit(X_train, y_train)

make predictions on the test set

yhat = model.predict(X_test)

evaluate predictions

score = accuracy_score(y_test, yhat)
print(‘Accuracy: %.3f’ % (score))

evaluate contributing models

for c in classifiers:
yhat = c.predict(X_test)
score = accuracy_score(y_test, yhat)
print(‘>%s: %.3f’ % (c.class.name, score))
Running the example first reports the mean accuracy for the model with the custom pool of classifiers and the accuracy of each contributing model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that again the KNORAU achieves an accuracy of about 91.3 percent, which is better than any contributing model.

Accuracy: 0.913

LogisticRegression: 0.878
DecisionTreeClassifier: 0.885
GaussianNB: 0.873
1
2
3
4
Accuracy: 0.913
LogisticRegression: 0.878
DecisionTreeClassifier: 0.885
GaussianNB: 0.873
Instead of specifying a pool of classifiers, it is also possible to specify a single ensemble algorithm from the scikit-learn library and the KNORA algorithm will automatically use the internal ensemble members as classifiers.

For example, we can use a random forest ensemble with 1,000 members as the base classifiers to consider within KNORA as follows:

…

define classifiers to use in the pool

pool = RandomForestClassifier(n_estimators=1000)

fit the classifiers on the training set

pool.fit(X_train, y_train)

define the KNORA-U model

model = KNORAU(pool_classifiers=pool)
1
2
3
4
5
6
7
…

define classifiers to use in the pool

pool = RandomForestClassifier(n_estimators=1000)

fit the classifiers on the training set

pool.fit(X_train, y_train)

define the KNORA-U model

model = KNORAU(pool_classifiers=pool)
Tying this together, the complete example of KNORA-U with random forest ensemble members as classifiers is listed below.

evaluate KNORA-U with a random forest ensemble as the classifier pool

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deslib.des.knora_u import KNORAU
from sklearn.ensemble import RandomForestClassifier
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

define classifiers to use in the pool

pool = RandomForestClassifier(n_estimators=1000)

fit the classifiers on the training set

pool.fit(X_train, y_train)

define the KNORA-U model

model = KNORAU(pool_classifiers=pool)

fit the model

model.fit(X_train, y_train)

make predictions on the test set

yhat = model.predict(X_test)

evaluate predictions

score = accuracy_score(y_test, yhat)
print(‘Accuracy: %.3f’ % (score))

evaluate the standalone model

yhat = pool.predict(X_test)
score = accuracy_score(y_test, yhat)
print(‘>%s: %.3f’ % (pool.class.name, score))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

evaluate KNORA-U with a random forest ensemble as the classifier pool

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from deslib.des.knora_u import KNORAU
from sklearn.ensemble import RandomForestClassifier
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

define classifiers to use in the pool

pool = RandomForestClassifier(n_estimators=1000)

fit the classifiers on the training set

pool.fit(X_train, y_train)

define the KNORA-U model

model = KNORAU(pool_classifiers=pool)

fit the model

model.fit(X_train, y_train)

make predictions on the test set

yhat = model.predict(X_test)

evaluate predictions

score = accuracy_score(y_test, yhat)
print(‘Accuracy: %.3f’ % (score))

evaluate the standalone model

yhat = pool.predict(X_test)
score = accuracy_score(y_test, yhat)
print(‘>%s: %.3f’ % (pool.class.name, score))
Running the example first reports the mean accuracy for the model with the custom pool of classifiers and the accuracy of the random forest model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the KNORA model with dynamically selected ensemble members out-performs the random forest with the statically selected (full set) ensemble members.

Accuracy: 0.968

RandomForestClassifier: 0.967
1
2
Accuracy: 0.968
RandomForestClassifier: 0.967