Dynamic Ensemble Selection (DES) for Classification in Python

Dynamic ensemble selection is an ensemble learning technique that automatically selects a subset of ensemble members just-in-time when making a prediction.

The technique involves fitting multiple machine learning models on the training dataset, then selecting the models that are expected to perform best when making a prediction for a specific new example, based on the details of the example to be predicted.

This can be achieved using a k-nearest neighbor model to locate examples in the training dataset that are closest to the new example to be predicted, evaluating all models in the pool on this neighborhood and using the models that perform the best on the neighborhood to make a prediction for the new example.

As such, the dynamic ensemble selection can often perform better than any single model in the pool and better than averaging all members of the pool, so-called static ensemble selection.

In this tutorial, you will discover how to develop dynamic ensemble selection models in Python.

After completing this tutorial, you will know:

  • Dynamic ensemble selection algorithms automatically choose ensemble members when making a prediction on new data.
  • How to develop and evaluate dynamic ensemble selection models for classification tasks using the scikit-learn API.
  • How to explore the effect of dynamic ensemble selection model hyperparameters on classification accuracy.

Kick-start your project with my new book Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Dynamic Ensemble Selection (DES) in Python

Dynamic Ensemble Selection (DES) in Python
Photo by Simon Harrod, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Dynamic Ensemble Selection
  2. k-Nearest Neighbor Oracle (KNORA) With Scikit-Learn
    1. KNORA-Eliminate (KNORA-E)
    2. KNORA-Union (KNORA-U)
  3. Hyperparameter Tuning for KNORA
    1. Explore k in k-Nearest Neighbor
    2. Explore Algorithms for Classifier Pool

Dynamic Ensemble Selection

Multiple Classifier Systems refer to a field of machine learning algorithms that use multiple models to address classification predictive modeling problems.

The first class of multiple classifier systems to find success is referred to as Dynamic Classifier Selection, or DCS for short.

  • Dynamic Classifier Selection: Algorithms that dynamically choose one from among many trained models to make a prediction based on the specific details of the input.

Dynamic Classifier Selection algorithms generally involve partitioning the input feature space in some way and assigning specific models to be responsible for making predictions for each partition. There are a variety of different DCS algorithms and research efforts are mainly focused on how to evaluate and assign classifiers to specific regions of the input space.

After training multiple individual learners, DCS dynamically selects one learner for each test instance. […] DCS makes predictions by using one individual learner.

— Page 93, Ensemble Methods: Foundations and Algorithms, 2012.

A natural extension to DCS is algorithms that select one or more models dynamically in order to make a prediction. That is, selecting a subset or ensemble of classifiers dynamically. These techniques are referred to as dynamic ensemble selection, or DES.

  • Dynamic Ensemble Selection: Algorithms that dynamically choose a subset of trained models to make a prediction based on the specific details of the input.

Dynamic Ensemble Selection algorithms operate much like DCS algorithms, except predictions are made using votes from multiple classifier models instead of a single best model. In effect, each region of the input feature space is owned by a subset of models that perform best in that region.

… given the fact that selecting only one classifier can be highly error-prone, some researchers decided to select a subset of the pool of classifiers rather than just a single base classifier. All base classifiers that obtained a certain competence level are used to compose the EoC, and their outputs are aggregated to predict the label …

Dynamic Classifier Selection: Recent Advances And Perspectives, 2018.

Perhaps the canonical approach to dynamic ensemble selection is the k-Nearest Neighbor Oracle, or KNORA, algorithm as it is a natural extension of the canonical dynamic classifier selection algorithm “Dynamic Classifier Selection Local Accuracy,” or DCS-LA.

DCS-LA involves selecting the k-nearest neighbors from the training or validation dataset for a given new input pattern, then selecting the single best classifier based on its performance in that neighborhood of k examples to make a prediction on the new example.

KNORA was described by Albert Ko, et al. in their 2008 paper titled “From Dynamic Classifier Selection To Dynamic Ensemble Selection.” It is an extension of DCS-LA that selects multiple models that perform well on the neighborhood and whose predictions are then combined using majority voting to make a final output prediction.

For any test data point, KNORA simply finds its nearest K neighbors in the validation set, figures out which classifiers correctly classify those neighbors in the validation set and uses them as the ensemble for classifying the given pattern in that test set.

From Dynamic Classifier Selection To Dynamic Ensemble Selection, 2008.

The selected classifier models are referred to as “oracles“, hence the use of oracle in the name of the method.

The ensemble is considered dynamic because the members are chosen just-in-time conditional on the specific input pattern requiring a prediction. This is opposed to static, where ensemble members are chosen once, such as averaging predictions from all classifiers in the model.

This is done through a dynamic fashion, since different patterns might require different ensembles of classifiers. Thus, we call our method a dynamic ensemble selection.

From Dynamic Classifier Selection To Dynamic Ensemble Selection, 2008.

Two versions of KNORA are described, including KNORA-Eliminate and KNORA-Union.

  • KNORA-Eliminate (KNORA-E): Ensemble of classifiers that achieves perfect accuracy on the neighborhood of the new example, with a reducing neighborhood size until at least one perfect classifier is located.
  • KNORA-Union (KNORA-U): Ensemble of all classifiers that makes at least one correct prediction on the neighborhood with weighted voting and votes proportional to accuracy on the neighborhood.

KNORA-Eliminate, or KNORA-E for short, involves selecting all classifiers that achieve perfect predictions on the neighborhood of k examples in the neighborhood. If no classifier achieves 100 percent accuracy, the neighborhood size is reduced by one and the models are re-evaluated. This process is repeated until one or more models are discovered that has perfect performance, and then used to make a prediction for the new example.

In the case where no classifier can correctly classify all the K-nearest neighbors of the test pattern, then we simply decrease the value of K until at least one classifier correctly classifies its neighbors

From Dynamic Classifier Selection To Dynamic Ensemble Selection, 2008.

KNORA-Union, or KNORA-U for short, involves selecting all classifiers that make at least one correct prediction in the neighborhood. The predictions from each classifier are then combined using a weighted average, where the number of correct predictions in the neighborhood indicates the number of votes assigned to each classifier.

The more neighbors a classifier classifies correctly, the more votes this classifier will have for a test pattern

From Dynamic Classifier Selection To Dynamic Ensemble Selection, 2008.

Now that we are familiar with DES and the KNORA algorithm, let’s look at how we can use it on our own classification predictive modeling projects.

Want to Get Started With Ensemble Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

k-Nearest Neighbor Oracle (KNORA) With Scikit-Learn

The Dynamic Ensemble Library, or DESlib for short, is a Python machine learning library that provides an implementation of many different dynamic classifiers and dynamic ensemble selection algorithms.

DESlib is an easy-to-use ensemble learning library focused on the implementation of the state-of-the-art techniques for dynamic classifier and ensemble selection.

First, we can install the DESlib library using the pip package manager, if it is not already installed.

Once installed, we can then check that the library was installed correctly and is ready to be used by loading the library and printing the installed version.

Running the script will print your version of the DESlib library you have installed.

Your version should be the same or higher. If not, you must upgrade your version of the DESlib library.

The DESlib provides an implementation of the KNORA algorithm with each dynamic ensemble selection technique via the KNORAE and KNORAU classes respectively.

Each class can be used as a scikit-learn model directly, allowing the full suite of scikit-learn data preparation, modeling pipelines, and model evaluation techniques to be used directly.

Both classes use a k-nearest neighbor algorithm to select the neighbor with a default value of k=7.

A bootstrap aggregation (bagging) ensemble of decision trees is used as the pool of classifier models considered for each classification that is made by default, although this can be changed by setting “pool_classifiers” to a list of models.

We can use the make_classification() function to create a synthetic binary classification problem with 10,000 examples and 20 input features.

Running the example creates the dataset and summarizes the shape of the input and output components.

Now that we are familiar with the DESlib API, let’s look at how to use each KNORA algorithm on our synthetic classification dataset.

KNORA-Eliminate (KNORA-E)

We can evaluate a KNORA-Eliminate dynamic ensemble selection algorithm on the synthetic dataset.

In this case, we will use default model hyperparameters, including bagged decision trees as the pool of classifier models and a k=7 for the selection of the local neighborhood when making a prediction.

We will evaluate the model using repeated stratified k-fold cross-validation with three repeats and 10 folds. We will report the mean and standard deviation of the accuracy of the model across all repeats and folds.

The complete example is listed below.

Running the example reports the mean and standard deviation accuracy of the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see the KNORA-E ensemble and default hyperparameters achieve a classification accuracy of about 91.5 percent.

We can also use the KNORA-E ensemble as a final model and make predictions for classification.

First, the model is fit on all available data, then the predict() function can be called to make predictions on new data.

The example below demonstrates this on our binary classification dataset.

Running the example fits the KNORA-E dynamic ensemble selection algorithm on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.

Now that we are familiar with using KNORA-E, let’s look at the KNORA-Union method.

KNORA-Union (KNORA-U)

We can evaluate a KNORA-Union model on the synthetic dataset.

In this case, we will use default model hyperparameters, including bagged decision trees as the pool of classifier models and a k=7 for the selection of the local neighborhood when making a prediction.

We will evaluate the model using repeated stratified k-fold cross-validation with three repeats and 10 folds. We will report the mean and standard deviation of the accuracy of the model across all repeats and folds.

The complete example is listed below.

Running the example reports the mean and standard deviation accuracy of the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see the KNORA-U dynamic ensemble selection model and default hyperparameters achieve a classification accuracy of about 93.3 percent.

We can also use the KNORA-U model as a final model and make predictions for classification.

First, the model is fit on all available data, then the predict() function can be called to make predictions on new data.

The example below demonstrates this on our binary classification dataset.

Running the example fits the KNORA-U model on the entire dataset and is then used to make a prediction on a new row of data, as we might when using the model in an application.

Now that we are familiar with using the scikit-learn API to evaluate and use KNORA models, let’s look at configuring the model.

Hyperparameter Tuning for KNORA

In this section, we will take a closer look at some of the hyperparameters you should consider tuning for the KNORA model and their effect on model performance.

There are many hyperparameters we can look at for KNORA, although in this case, we will look at the value of k in the k-nearest neighbor model used in the local evaluation of the models, and how to use a custom pool of classifiers.

We will use the KNORA-Union as the basis for these experiments, although the choice of the specific method is arbitrary.

Explore k in k-Nearest Neighbors

The configuration of the k-nearest neighbors algorithm is critical to the KNORA model as it defines the scope of the neighborhood in which each ensemble is considered for selection.

The k value controls the size of the neighborhood and it is important to set it to a value that is appropriate for your dataset, specifically the density of samples in the feature space. A value too small will mean that relevant examples in the training set might be excluded from the neighborhood, whereas values too large may mean that the signal is being washed out by too many examples.

The code example below explores the classification accuracy of the KNORA-U algorithm with k values from 2 to 21.

Running the example first reports the mean accuracy for each configured neighborhood size.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that accuracy increases with the neighborhood size, perhaps to k=10, where it appears to level off.

A box and whisker plot is created for the distribution of accuracy scores for each configured neighborhood size.

We can see the general trend of increasing model performance and k value before reaching a plateau.

Box and Whisker Plots of Accuracy Distributions for k Values in KNORA-U

Box and Whisker Plots of Accuracy Distributions for k Values in KNORA-U

Explore Algorithms for Classifier Pool

The choice of algorithms used in the pool for the KNORA is another important hyperparameter.

By default, bagged decision trees are used, as it has proven to be an effective approach on a range of classification tasks. Nevertheless, a custom pool of classifiers can be considered.

In the majority of DS publications, the pool of classifiers is generated using either well known ensemble generation methods such as Bagging, or by using heterogeneous classifiers.

Dynamic Classifier Selection: Recent Advances And Perspectives, 2018.

This requires first defining a list of classifier models to use and fitting each on the training dataset. Unfortunately, this means that the automatic k-fold cross-validation model evaluation methods in scikit-learn cannot be used in this case. Instead, we will use a train-test split so that we can fit the classifier pool manually on the training dataset.

The list of fit classifiers can then be specified to the KNORA-Union (or KNORA-Eliminate) class via the “pool_classifiers” argument. In this case, we will use a pool that includes logistic regression, a decision tree, and a naive Bayes classifier.

The complete example of evaluating the KNORA ensemble and a custom set of classifiers on the synthetic dataset is listed below.

Running the example first reports the mean accuracy for the model with the custom pool of classifiers.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the model achieved an accuracy of about 91.3 percent.

In order to adopt the KNORA model, it must perform better than any contributing model. Otherwise, we would simply use the contributing model that performs better.

We can check this by evaluating the performance of each contributing classifier on the test set.

The updated example of KNORA with a custom pool of classifiers that are also evaluated independently is listed below.

Running the example first reports the mean accuracy for the model with the custom pool of classifiers and the accuracy of each contributing model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that again the KNORAU achieves an accuracy of about 91.3 percent, which is better than any contributing model.

Instead of specifying a pool of classifiers, it is also possible to specify a single ensemble algorithm from the scikit-learn library and the KNORA algorithm will automatically use the internal ensemble members as classifiers.

For example, we can use a random forest ensemble with 1,000 members as the base classifiers to consider within KNORA as follows:

Tying this together, the complete example of KNORA-U with random forest ensemble members as classifiers is listed below.

Running the example first reports the mean accuracy for the model with the custom pool of classifiers and the accuracy of the random forest model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the KNORA model with dynamically selected ensemble members out-performs the random forest with the statically selected (full set) ensemble members.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Related Tutorials

Papers

Books

APIs

Summary

In this tutorial, you discovered how to develop dynamic ensemble selection models in Python.

Specifically, you learned:

  • Dynamic ensemble selection algorithms automatically choose ensemble members when making a prediction on new data.
  • How to develop and evaluate dynamic ensemble selection models for classification tasks using the scikit-learn API.
  • How to explore the effect of dynamic ensemble selection model hyperparameters on classification accuracy.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Get a Handle on Modern Ensemble Learning!

Ensemble Learning Algorithms With Python

Improve Your Predictions in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner, and much more...

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects


See What's Inside

2 Responses to Dynamic Ensemble Selection (DES) for Classification in Python

  1. Avatar
    Ante March 7, 2022 at 11:53 am #

    Thanks, Jason.
    DESlib seems dedicated to classification only
    Is there a similar tool for regression?

  2. Avatar
    RK January 7, 2023 at 8:08 pm #

    I observed in the ‘explore classification tool’ last two code sections. Where the test size s 0.5. I reduced the test size to 0.7 and 0.8 result degraded. Surprisingly, increasing the test set (or reducing the training portions) improves the result. Please explain to me, why?

Leave a Reply