A Gentle Introduction to Ensemble Diversity for Machine Learning

Ensemble learning combines the predictions from machine learning models for classification and regression.

We pursue using ensemble methods to achieve improved predictive performance, and it is this improvement over any of the contributing models that defines whether an ensemble is good or not.

A property that is present in a good ensemble is the diversity of the predictions made by contributing models. Diversity is a slippery concept as it has not been precisely defined; nevertheless, it provides a useful practical heuristic for designing good ensemble models.

In this post, you will discover ensemble diversity in machine learning.

After reading this post, you will know:

  • A good ensemble is one that has better performance than any contributing model.
  • Ensemble diversity is a property of a good ensemble where contributing models make different errors for the same input.
  • Seeking independent models and uncorrelated predictions provides a guide for thinking about and introducing diversity into ensemble models.

Kick-start your project with my new book Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Gentle Introduction to Ensemble Diversity for Machine Learning

A Gentle Introduction to Ensemble Diversity for Machine Learning
Photo by Bernd Thaller, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. What Makes a Good Ensemble
  2. What Is Ensemble Diversity
  3. Methods for Increasing Diversity

What Makes a Good Ensemble

An ensemble is a machine learning model that combines the predictions from multiple other models.

This often has the effect of reducing prediction error and improving the generalization of the model. But this is not always the case.

Sometimes the ensemble performs no better than a well-performing contributing member to the ensemble. Even worse, sometimes an ensemble will perform worse than any contributing member.

This raises the question as to what makes a good ensemble.

A good ensemble is an ensemble that performs better than any contributing member. That is, it is a model that has lower prediction error for regression or higher accuracy for classification.

  • Good Ensemble: A model that performs better than any single contributing model.

This can be evaluated empirically using a train and test set or a resampling technique like k-fold cross-validation. The results can similarly be estimated for each contributing model and the results compared directly to see if the definition of a “good ensemble” is met.

What properties of a good ensemble differentiate it from other ensembles that perform as good or worse than any contributing member?

This is a widely studied question with many ideas. The consistency is that a good ensemble has diversity.

What Is Ensemble Diversity

Ensemble diversity refers to differences in the decisions or predictions made by the ensemble members.

Ensemble diversity, that is, the difference among the individual learners, is a fundamental issue in ensemble methods.

— Page 99, Ensemble Methods, 2012.

Two ensemble members that make identical predictions are considered not diverse.

Ensembles that make completely different predictions in all cases are maximally diverse, although this is most unlikely.

Making different errors on any given sample is of paramount importance in ensemble-based systems. After all, if all ensemble members provide the same output, there is nothing to be gained from their combination.

— Page 5, Ensemble Machine Learning, 2012.

Therefore, some level of diversity in predictions is desired or even required in order to construct a good ensemble. This is often simplified in discussion to the desire for diverse ensemble members, e.g. the models themselves, that in turn will produce diverse predictions, although diversity in predictions is truly what we seek.

Ideally, diversity would mean that the predictions made by each ensemble member are independent and uncorrelated.

… classifier outputs should be independent or preferably negatively correlated.

— Page 5, Ensemble Machine Learning, 2012.

Independence is a term from probability theory and refers to the case where one event does not influence the probability of the occurrence of another event. Events can influence each other in many different ways. One ensemble may influence another if it attempts to correct the predictions made by it.

As such, depending on the ensemble type, models may be naturally dependent or independent.

  • Independence: Whether the occurrence of an event affects the probability of subsequent events.

Correlation is a term from statistics and refers to two variables changing together. It is common to calculate a normalized correlation score between -1.0 and 1.0 where a score of 0.0 indicates no correlation, e.g. uncorrelated. A score of 1.0 or -1.0 one indicates perfect positive and negative correlation respectively and indicates two models that always predict the same (or inverse) outcome.

… if the learners are independent, i.e., [correlation] = 0, the ensemble will achieve a factor of T of error reduction than the individual learners; if the learners are totally correlated, i.e., [correlation] = 1, no gains can be obtained from the combination.

— Page 99, Ensemble Methods, 2012.

In practice, ensembles often exhibit a weak or modest positive correlation in their predictions.

  • Correlation: The degree to which variables change together.

As such, ensemble models are constructed from constituent models that may or may not have some dependency and some correlation between the decisions or predictions made. Constructing good ensembles is a challenging task.

For example, combining a bunch of top-performing models will likely result in a poor ensemble as the predictions made by the models will be highly correlated. Unintuitively, you might be better off combining the predictions from a few top-performing individual models with the prediction from a few weaker models.

So, it is desired that the individual learners should be accurate and diverse. Combining only accurate learners is often worse than combining some accurate ones together with some relatively weak ones, since complementarity is more important than pure accuracy.

— Page 100, Ensemble Methods, 2012.

Sadly, there is no generally agreed upon measure of ensemble diversity and ideas of independence and correlation are only guides or proxies for thinking about the properties of good ensembles.

Unfortunately, though diversity is crucial, we still do not have a clear understanding of diversity; for example, currently there is no well-accepted formal definition of diversity.

— Page 100, Ensemble Methods, 2012.

Nevertheless, as guides, they are useful as we can devise techniques that attempt to reduce the correlation between the models.

Want to Get Started With Ensemble Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Methods for Increasing Diversity

The objective for developing good ensembles is considered through the lens of increasing the diversity of ensemble members.

Models in an ensemble may be more dependent if they share the same algorithm used to train the model and/or the same dataset used to train the model. Models in an ensemble may have higher correlation between their predictions for the same general reasons.

Therefore, approaches that make the models and/or the training data more different are desirable.

Diversity may be obtained through different presentations of the input data, as in bagging, variations in learner design, or by adding a penalty to the outputs to encourage diversity.

— Page 93, Pattern Classification Using Ensemble Methods, 2010.

It can be helpful to have a framework for thinking about techniques for managing the diversity of ensembles, and there are many to choose from.

For example, Zhi-Hua Zhou in Chapter 5 of his 2012 book titled “Ensemble Methods: Foundations and Algorithms” proposes a framework of four approaches to generating diversity. In summary, they are:

  • Data Sample Manipulation: e.g. sample the training dataset differently for each model.
  • Input Feature Manipulation: e.g. train each model on different groups of input features.
  • Learning Parameter Manipulation: e.g. train models with different hyperparameter values.
  • Output Representation Manipulation: e.g. train models with differently modified target values.

Perhaps the most common approaches are changes to the training data, that is data samples and input features, often combined in a single approach.

For another example of a taxonomy for generating a diverse ensemble, Lior Rokach in Chapter 4 his 2010 book titled “Pattern Classification Using Ensemble Methods” suggest a similar taxonomy, summarized as:

  • Manipulating the Inducer, e.g. manipulating in how models are trained.
    • Vary hyperparameters.
    • Varying starting point.
    • Vary optimization algorithm.
  • Manipulating the Training Sample, e.g. manipulating the data used for training.
    • Resampling.
  • Changing the target attribute representation, e.g. manipulating the target variable.
    • Vary encoding.
    • Error-correcting codes.
    • Label switching.
  • Partitioning the search space, e.g. manipulating the number of input features.
    • Random subspace.
    • Feature selection.
  • Hybridization, e.g. varying model types or a mixture of the above methods.

The hybridization approach is common both in popular ensemble methods that combine multiple approaches at generating diversity in a single algorithm, as well as simply varying the algorithms (model types) that comprise the ensemble.

Bundling competing models into ensembles almost always improves generalization – and using different algorithms as the perturbation operator is an effective way to obtain the requisite diversity of components.

— Page 89, Ensemble Methods in Data Mining, 2010.

This highlights that although we could investigate quantifying how diverse an ensemble is, that instead, most effort is on developing techniques that generate diversity.

We can harness this knowledge in developing our own ensemble methods, first trying standard and well-understood ensemble learning methods, then tailoring them for our specific dataset with model independence and correlation of predictions in mind in an effort to get the most out of them.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.




In this post, you discovered ensemble diversity in machine learning.

Specifically, you learned:

  • A good ensemble is one that has better performance than any contributing model.
  • Ensemble diversity is a property of a good ensemble where contributing models make different errors for the same input.
  • Seeking independent models and uncorrelated predictions provides a guide for thinking about and introducing diversity into ensemble models.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Get a Handle on Modern Ensemble Learning!

Ensemble Learning Algorithms With Python

Improve Your Predictions in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner, and much more...

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects

See What's Inside

6 Responses to A Gentle Introduction to Ensemble Diversity for Machine Learning

  1. Avatar
    nmo May 14, 2021 at 6:35 pm #

    Thanks for this educative article.

    I am curious about the “Output Representation Manipulation”, where the target is modified let’s say for each member of the ensemble, is there any implementation for this?

  2. Avatar
    Nikhil Anival May 16, 2021 at 4:38 pm #

    Thanks for the informative information about Machine learning.

  3. Avatar
    Roger Federaj October 8, 2022 at 12:32 pm #

    So are there any resources for quantifying ensemble diversity?

Leave a Reply