k-Nearest Neighbor Oracle (KNORA) With Scikit-Learn

swapneel-panda-419bc751 · 15 June 2021 07:54

The Dynamic Ensemble Library, or DESlib for short, is a Python machine learning library that provides an implementation of many different dynamic classifiers and dynamic ensemble selection algorithms.

DESlib is an easy-to-use ensemble learning library focused on the implementation of the state-of-the-art techniques for dynamic classifier and ensemble selection.

First, we can install the DESlib library using the pip package manager, if it is not already installed.

sudo pip install deslib
1
sudo pip install deslib
Once installed, we can then check that the library was installed correctly and is ready to be used by loading the library and printing the installed version.

check deslib version

import deslib
print(deslib.version)
1
2
3

check deslib version

import deslib
print(deslib.version)
Running the script will print your version of the DESlib library you have installed.

Your version should be the same or higher. If not, you must upgrade your version of the DESlib library.

0.3
1
0.3
The DESlib provides an implementation of the KNORA algorithm with each dynamic ensemble selection technique via the KNORAE and KNORAU classes respectively.

Each class can be used as a scikit-learn model directly, allowing the full suite of scikit-learn data preparation, modeling pipelines, and model evaluation techniques to be used directly.

Both classes use a k-nearest neighbor algorithm to select the neighbor with a default value of k=7.

A bootstrap aggregation (bagging) ensemble of decision trees is used as the pool of classifier models considered for each classification that is made by default, although this can be changed by setting “pool_classifiers” to a list of models.

We can use the make_classification() function to create a synthetic binary classification problem with 10,000 examples and 20 input features.

synthetic binary classification dataset

from sklearn.datasets import make_classification

define dataset

X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

summarize the dataset

print(X.shape, y.shape)
1
2
3
4
5
6

synthetic binary classification dataset

from sklearn.datasets import make_classification

define dataset

X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=7)

summarize the dataset

print(X.shape, y.shape)
Running the example creates the dataset and summarizes the shape of the input and output components.

(10000, 20) (10000,)
1
(10000, 20) (10000,)