How to Use Ensemble Machine Learning Algorithms in Weka

Ensemble algorithms are a powerful class of machine learning algorithm that combine the predictions from multiple models.

A benefit of using Weka for applied machine learning is that makes available so many different ensemble machine learning algorithms.

In this post you will discover the how to use ensemble machine learning algorithms in Weka.

After reading this post you will know:

  • About 5 top ensemble machine learning algorithms.
  • How to use top ensemble algorithms in Weka.
  • About key configuration parameters for ensemble algorithms in Weka.

Let’s get started.

How to Use Ensemble Machine Learning Algorithms in Weka

How to Use Ensemble Machine Learning Algorithms in Weka
Photo by LEONARDO DASILVA, some rights reserved.

Ensemble Algorithms Overview

We are going to take a tour of 5 top ensemble machine learning algorithms in Weka.

Each algorithm that we cover will be briefly described in terms of how it works, key algorithm parameters will be highlighted and the algorithm will be demonstrated in the Weka Explorer interface.

The 5 algorithms that we will review are:

  1. Bagging
  2. Random Forest
  3. AdaBoost
  4. Voting
  5. Stacking

These are 5 algorithms that you can try on your problem in order to lift performance.

A standard machine learning classification problem will be used to demonstrate each algorithm.

Specifically, the Ionosphere binary classification problem. This is a good dataset to demonstrate classification algorithms because the input variables are numeric and all have the same scale the problem only has two classes to discriminate.

Each instance describes the properties of radar returns from the atmosphere and the task is to predict whether or not there is structure in the ionosphere or not. There are 34 numerical input variables of generally the same scale. You can learn more about this dataset on the UCI Machine Learning Repository. Top results are in the order of 98% accuracy.

Start the Weka Explorer:

  1. Open the Weka GUI Chooser.
  2. Click the “Explorer” button to open the Weka Explorer.
  3. Load the Ionosphere dataset from the data/ionosphere.arff file
  4. Click “Classify” to open the Classify tab.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Bootstrap Aggregation (Bagging)

Bootstrap Aggregation or Bagging for short is an ensemble algorithm that can be used for classification or regression.

Bootstrap is a statistical estimation technique where a statistical quantity like a mean is estimated from multiple random samples of your data (with replacement). It is a useful technique when you have a limited amount of data and you are interested in a more robust estimate of a statistical quantity.

This sample principle can be used with machine learning models. Multiple random samples of your training data are drawn with replacement and used to train multiple different machine learning models. Each model is then used to make a prediction and the results are averaged to give a more robust prediction.

It is a technique that is best used with models that have a low bias and a high variance, meaning that the predictions they make are highly dependent on the specific data from which they were trained. The most used algorithm for bagging that fits this requirement of high variance are decision trees.

Choose the bagging algorithm:

  1. Click the “Choose” button and select “Bagging” under the “meta” group.
  2. Click on the name of the algorithm to review the algorithm configuration.
Weka Configuration for the Bagging Algorithm

Weka Configuration for the Bagging Algorithm

A key configuration parameter in bagging is the type of model being bagged. The default is the REPTree which is the Weka implementation of a standard decision tree, also called a Classification and Regression Tree or CART for short. This is specified in the classifier parameter.

The size of each random sample is specified in the bagSizePercent, which is a size as a percentage of the raw training dataset. The default is 100% which will create a new random sample the same size as the training dataset, but will have a different composition.

This is because the random sample is drawn with replacement, which means that each time an instance is randomly drawn from the training dataset and added to the sample, it is also added back into the training dataset (replaced) meaning that it can be chosen again and added twice or more times to the sample.

Finally, the number of bags (and number of classifiers) can be specified in the numIterations parameter. The default is 10, although it is common to use values in the hundreds or thousands. Continue to increase the value of numIterations until you no longer see an improvement in the model, or you run out of memory.

  1. Click “OK” to close the algorithm configuration.
  2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that bagging achieves an accuracy of 91%.

Weka Classification Results for the Bagging Algorithm

Weka Classification Results for the Bagging Algorithm

Random Forest

Random Forest is an extension of bagging for decision trees that can be used for classification or regression.

A down side of bagged decision trees is that decision trees are constructed using a greedy algorithm that selects the best split point at each step in the tree building process. As such, the resulting trees end up looking very similar which reduces the variance of the predictions from all the bags which in turn hurts the robustness of the predictions made.

Random Forest is an improvement upon bagged decision trees that disrupts the greedy splitting algorithm during tree creation so that split points can only be selected from a random subset of the input attributes. This simple change can have a big effect decreasing the similarity between the bagged trees and in turn the resulting predictions.

Choose the random forest algorithm:

  1. Click the “Choose” button and select “RandomForest” under the “trees” group.
  2. Click on the name of the algorithm to review the algorithm configuration.
Weka Configuration for the Random Forest Algorithm

Weka Configuration for the Random Forest Algorithm

In addition to the parameters listed above for bagging, a key parameter for random forest is the number of attributes to consider in each split point. In Weka this can be controlled by the numFeatures attribute, which by default is set to 0, which selects the value automatically based on a rule of thumb.

  1. Click “OK” to close the algorithm configuration.
  2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that random forests achieves an accuracy of 92%.

Weka Classification Results for the Random Forest Algorithm

Weka Classification Results for the Random Forest Algorithm

AdaBoost

AdaBoost is an ensemble machine learning algorithm for classification problems.

It is part of a group of ensemble methods called boosting, that add new machine learning models in a series where subsequent models attempt to fix the prediction errors made by prior models. AdaBoost was the first successful implementation of this type of model.

Adaboost was designed to use short decision tree models, each with a single decision point. Such short trees are often referred to as decision stumps.

The first model is constructed as per normal. Each instance in the training dataset is weighted and the weights are updated based on the overall accuracy of the model and whether an instance was classified correctly or not. Subsequent models are trained and added until a minimum accuracy is achieved or no further improvements are possible. Each model is weighted based on its skill and these weights are used when combining the predictions from all of the models on new data.

Choose the AdaBoost algorithm:

  1. Click the “Choose” button and select “AdaBoostM1” under the “meta” group.
  2. Click on the name of the algorithm to review the algorithm configuration.
Weka Configuration for the AdaBoost Algorithm

Weka Configuration for the AdaBoost Algorithm

The weak learner within the AdaBoost model can be specified by the classifier parameter.

The default is the decision stump algorithm, but other algorithms can be used. a key parameter in addition to the weak learner is the number of models to create and add in series. This can be specified in the numIterations parameter and defaults to 10.

  1. Click “OK” to close the algorithm configuration.
  2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that AdaBoost achieves an accuracy of 90%.

Weka Classification Results for the AdaBoost Algorithm

Weka Classification Results for the AdaBoost Algorithm

Voting

Voting is perhaps the simplest ensemble algorithm, and is often very effective. It can be used for classification or regression problems.

Voting works by creating two or more sub-models. Each sub-model makes predictions which are combined in some way, such as by taking the mean or the mode of the predictions, allowing each sub-model to vote on what the outcome should be.

Choose the Vote algorithm:

  1. Click the “Choose” button and select “Vote” under the “meta” group.
  2. Click on the name of the algorithm to review the algorithm configuration.
Weka Configuration for the Voting Ensemble Algorithm

Weka Configuration for the Voting Ensemble Algorithm

The key parameter of a Vote ensemble is the selection of sub-models.

Models can be specified in Weka in the classifier parameter. Clicking this parameter lets you add a number of classifiers.

Weka Algorithm Selection for the Voting Ensemble Algorithm

Weka Algorithm Selection for the Voting Ensemble Algorithm

Clicking the “Edit” button with a classifier selected lets you configure the details of that classifier. An objective in selecting sub-models is to select models that make quite different predictions (uncorrelated predictions). As such, it is a good rule of thumb to select very different model types, such as trees, instance based methods, functions and so on.

Another key parameter to configure for voting is how the predictions of the sub models are combined. This is controlled by the combinationRule parameter which is set to take the average of the probabilities by default.

  1. Click “OK” to close the algorithm configuration.
  2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that Vote achieves an accuracy of 64%.

Obviously, this technique achieved poor results because only the ZeroR sub-model was selected. Try selecting a collection of 5-to-10 different sub models.

Stacked Generalization (Stacking)

Stacked Generalization or Stacking for short is a simple extension to Voting ensembles that can be used for classification and regression problems.

In addition to selecting multiple sub-models, stacking allows you to specify another model to learn how to best combine the predictions from the sub-models. Because a meta model is used to best combine the predictions of sub-models, this technique is sometimes called blending, as in blending predictions together.

Choose the Stacking algorithm:

  1. Click the “Choose” button and select “Stacking” under the “meta” group.
  2. Click on the name of the algorithm to review the algorithm configuration.
Weka Configuration for the Stacking Ensemble Algorithm

Weka Configuration for the Stacking Ensemble Algorithm

As with the Vote classifier, you can specify the sub-models in the classifiers parameter.

The model that will be trained to learn how to best combine the predictions from the sub model can be specified in the metaClassifier parameter, which is set to ZeroR (majority vote or mean) by default. It is common to use a linear algorithm like linear regression or logistic regression for regression and classification type problems respectively. This is to achieve an output that is a simple linear combination of the predictions of the sub models.

  1. Click “OK” to close the algorithm configuration.
  2. Click the “Start” button to run the algorithm on the Ionosphere dataset.

You can see that with the default configuration that Stacking achieves an accuracy of 64%.

Again, the same as voting, Stacking achieved poor results because only the ZeroR sub-model was selected. Try selecting a collection of 5-to-10 different sub models and a good model to combine the predictions.

Weka Classification Results for the Stacking Ensemble Algorithm

Weka Classification Results for the Stacking Ensemble Algorithm

Summary

In this post you discovered how to use ensemble machine learning algorithms in Weka.

Specifically you learned:

  • About 5 ensemble machine learning algorithms that you can use on your problem.
  • How to use ensemble machine learning algorithms in Weka.
  • About the key configuration parameters for ensemble machine learning algorithms in Weka.

Do you have any questions about ensemble machine learning algorithms in Weka or about this post? Ask your questions in the comments and I will do my best to answer.


Want Machine Learning Without The Code?

Master Machine Learning With Weka

Develop Your Own Models in Minutes

…with just a few a few clicks

Discover how in my new Ebook:
Machine Learning Mastery With Weka

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring The Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.


11 Responses to How to Use Ensemble Machine Learning Algorithms in Weka

  1. Seun August 17, 2016 at 6:30 am #

    Hi Jason,
    I noticed that when I used stacking to combine series of classifiers, there was a significant amount of difference in the ROC of the classifiers used when the meta and base learners were interchanged. For instance, when I used Naïve Bayes as a base learner and J48 as the meta learner, ROC of 0.805 was recorded but when I used J48 as the base learner and Naive Bayes as the meta learner, ROC of 0.926 was recoreded. This similar pattern was noticed for some other algorithms used. Could you please shed more light into the most likely reasons for this occurrence?
    Thanks

    • Jason Brownlee August 17, 2016 at 10:04 am #

      Interesting finding, I don’t know off hand Seun.

      • Seun August 17, 2016 at 10:21 am #

        Alright Jason. Thanks

  2. Lebron April 16, 2017 at 2:59 pm #

    Hi Jason,

    I am new to Weka and trying to use random forest to train a regression model. I have three questions as below.

    1) Cross-validation:
    I am using 10-fold cross-validation but I wonder whether the data is shuffled before divided into 10 folds. Specifically, why is the performance statistic always same for the same algorithm?
    Furthermore, what is the final model trained by cross-validation? As I see the performance statistic is averaged by 10 models, but how is the final model derived?

    2) Random Forest
    I am using the default parameters and 10-fold cross-validation, same question as 1), why the performance is always the same? I thought random forest would randomly select the instances as training set, and also select a random subset of the input attributes for each tree. But the training performance is always the same every time I perform random forest on my dataset.

    Thanks.

  3. curso viver melhor agora July 17, 2017 at 7:24 pm #

    Hello, all is going fine here and ofcourse every one is sharing data, that’s actually fine, keep up writing.

  4. elahe August 28, 2017 at 12:42 am #

    hi Mr jason
    I used a combination of algorithms to predict the number of defects
    I want to use the weighted average method to combine algorithms. Is the mean weighted method in veka implemented? How?
    Please Guide me

    • Jason Brownlee August 28, 2017 at 6:50 am #

      You could take the mean of the predictions directly or you could use a new model to learn how to best combine the predictions.

  5. Arwa October 10, 2017 at 4:14 pm #

    Hi Jason;
    I used Random Forest and I want to know why there is misclassifying in some testing data. Is there some specific way to know that.

    • Jason Brownlee October 10, 2017 at 4:45 pm #

      There will always be prediction errors given noise in the data and a mismatch between the model and the true (but unknown) underlying mapping between inputs and outputs.

      We use skill scores to get an idea of how well a model performs on average when making predictions on new data.

Leave a Reply