How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

Ensembles can give you a boost in accuracy on your dataset.

In this post you will discover how you can create three of the most powerful types of ensembles in R.

This case study will step you through Boosting, Bagging and Stacking and show you how you can continue to ratchet up the accuracy of the models on your own datasets.

Let’s get started.

Build an Ensemble Of Machine Learning Algorithms in R

Build an Ensemble Of Machine Learning Algorithms in R
Photo by Barbara Hobbs, some rights reserved.

Increase The Accuracy Of Your Models

It can take time to find well performing machine learning algorithms for your dataset. This is because of the trial and error nature of applied machine learning.

Once you have a shortlist of accurate models, you can use algorithm tuning to get the most from each algorithm.

Another approach that you can use to increase accuracy on your dataset is to combine the predictions of multiple different models together.

This is called an ensemble prediction.

Combine Model Predictions Into Ensemble Predictions

The three most popular methods for combining the predictions from different models are:

  • Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset.
  • Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the chain.
  • Stacking. Building multiple models (typically of differing types) and supervisor model that learns how to best combine the predictions of the primary models.

This post will not explain each of these methods. It assumes you are generally familiar with machine learning algorithms and ensemble methods and that you are looking for information on how to create ensembles with R.

Get Started with Machine Learning in R, Right Now

Machine Learning Mastery With R Mini Course Table of Contents

R is the most popular platform among professional data scientists for applied machine learning.

Download your mini-course in Machine Learning with R.

Start Your FREE Mini-Course >> 

FREE 14-Day Mini-Course in
Machine Learning with R

Download your PDF containing all 14 lessons.

Get your daily lesson via email with tips and tricks.

Ensemble Machine Learning in R

You can create ensembles of machine learning algorithms in R.

There are three main techniques that you can create an ensemble of machine learning algorithms in R: Boosting, Bagging and Stacking. In this section, we will look at each in turn.

Before we start building ensembles, let’s define our test set-up.

Test Dataset

All of the examples of ensemble predictions in this case study will use the ionosphere dataset.

This is a dataset available from the UCI Machine Learning Repository. This dataset describes high-frequency antenna returns from high energy particles in the atmosphere and whether the return shows structure or not. The problem is a binary classification that contains 351 instances and 35 numerical attributes.

Let’s load the libraries and the dataset.

Note that the first attribute was a factor (0,1) and has been transformed to be numeric for consistency with all of the other numeric attributes. Also note that the second attribute is a constant and has been removed.

Here is a sneak-peek at the first few rows of the ionosphere dataset.

For more information, see the description of the Ionosphere dataset on the UCI Machine Learning Repository.

See this summary of published world-class results on the dataset.

1. Boosting Algorithms

We can look at two of the most popular boosting machine learning algorithms:

  • C5.0
  • Stochastic Gradient Boosting

Below is an example of the C5.0 and Stochastic Gradient Boosting (using the Gradient Boosting Modeling implementation) algorithms in R. Both algorithms include parameters that are not tuned in this example.

We can see that the C5.0 algorithm produces a more accurate model with an accuracy of 94.58%.

Boosting Machine Learning Algorithms in R

Boosting Machine Learning Algorithms in R

Learn more about caret boosting models tree: Boosting Models.

2. Bagging Algorithms

Let’s look at two of the most popular bagging machine learning algorithms:

  • Bagged CART
  • Random Forest

Below is an example of the Bagged CART and Random Forest algorithms in R. Both algorithms include parameters that are not tuned in this example.

We can see that random forest produces a more accurate model with an accuracy of 93.25%.

Bagging Machine Learning Algorithms in R

Bagging Machine Learning Algorithms in R

Learn more about caret bagging model here: Bagging Models.

3. Stacking Algorithms

You can combine the predictions of multiple caret models using the caretEnsemble package.

Given a list of caret models, the caretStack() function can be used to specify a higher-order model to learn how to best combine the predictions of sub-models together.

Let’s first look at creating 5 sub-models for the ionosphere dataset, specifically:

  • Linear Discriminate Analysis (LDA)
  • Classification and Regression Trees (CART)
  • Logistic Regression (via Generalized Linear Model or GLM)
  • k-Nearest Neighbors (kNN)
  • Support Vector Machine with a Radial Basis Kernel Function (SVM)

Below is an example that creates these 5 sub-models. Note the new helpful caretList() function provided by the caretEnsemble package for creating a list of standard caret models.

We can see that the SVM creates the most accurate model with an accuracy of 94.66%.

Comparison of Sub-Models for Stacking Ensemble in R

Comparison of Sub-Models for Stacking Ensemble in R

When we combine the predictions of different models using stacking, it is desirable that the predictions made by the sub-models have low correlation. This would suggest that the models are skillful but in different ways, allowing a new classifier to figure out how to get the best from each model for an improved score.

If the predictions for the sub-models were highly corrected (>0.75) then they would be making the same or very similar predictions most of the time reducing the benefit of combining the predictions.

We can see that all pairs of predictions have generally low correlation. The two methods with the highest correlation between their predictions are Logistic Regression (GLM) and kNN at 0.517 correlation which is not considered high (>0.75).

Correlations Between Predictions Made By Sub-Models in Stacking Ensemble

Correlations Between Predictions Made By Sub-Models in Stacking Ensemble

Let’s combine the predictions of the classifiers using a simple linear model.

We can see that we have lifted the accuracy to 94.99% which is a small improvement over using SVM alone. This is also an improvement over using random forest alone on the dataset, as observed above.

We can also use more sophisticated algorithms to combine predictions in an effort to tease out when best to use the different methods. In this case, we can use the random forest algorithm to combine the predictions.

We can see that this has lifted the accuracy to 96.26% an impressive improvement on SVM alone.

You Can Build Ensembles in R

You do not need to be an R programmer. You can copy and paste the sample code from this blog post to get started. Study the functions used in the examples using the built-in help in R.

You do not need to be a machine learning expert. Creating ensembles can be very complex if you are doing it from scratch. The caret and the caretEnsemble package allow you start creating and experimenting with ensembles even if you don’t have a deep understanding of how they work. Read-up on each type of ensemble to get more out of them at a later time.

You do not need to collect your own data. The data used in this case study was from the mlbench package. You can use standard machine learning dataset like this to learn, use and experiment with machine learning algorithms.

You do not need to write your own ensemble code. Some of the most powerful algorithms for creating ensembles is provided by R, ready to run. Use the examples in this post to get started right now. You can always adapt it to your specific cases or try out new ideas with custom code at a later time.

Summary

In this post you discovered that you can use ensembles of machine learning algorithms to improve the accuracy of your models.

You discovered three types of ensembles of machine learning algorithms that you can build in R:

  • Boosting
  • Bagging
  • Stacking

You can use the code in this case study as a template on your current or next machine learning project in R.

Next Step

Did you work through the case study?

  1. Start your R interactive environment.
  2. Type or copy-paste all of the code in this case study.
  3. Take the time to understand each part of the case study using the help for R functions.

Do you have any questions about this case study or using ensembles in R? Leave a comment and ask and I will do my best to answer.

Frustrated With Your Progress In R Machine Learning?

Develop Your Own Models and Predictions in Minutes

...with just a few lines of R code

Discover how in my new Ebook: Machine Learning Mastery With R

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, build models, algorithm tuning, and much more...

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Click to learn more.

54 Responses to How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

  1. Robert April 8, 2016 at 12:24 am #

    Thanks for posting Jason, very helpful.

    One question: In the stacked random forest and GLM ensemble models, how are the hyperparameters specified for each model (e.g., which value of k for k Nearest Neighbor, how many trees in the random forest)? I’m assuming default values are used, but can these values be tweaked within the caretEnsemble package?

    • Jason Brownlee April 8, 2016 at 1:36 pm #

      The algorithms use “sensible” defaults. I think 100 trees for RF, I don’t recall the value for K in KNN. The caret docs will specify the default or you can print the model after training.

  2. Sreenath April 19, 2016 at 11:19 pm #

    Hi Jason,

    I started following your posts recently and so far this is one of the best post I came across, not just in this website but among all the ML websites I have visited. Really appreciate your efforts to explain such a useful topic in a simple & clear manner. I will definitely try out this technique soon. Thanks again..

  3. pradnya May 6, 2016 at 5:51 am #

    One of best articles I came accross. Superhelpful

    Thanks!!

  4. Ahmed Mohy June 3, 2016 at 6:19 pm #

    one question : how to stack models of different subsets of a dataset into one model i tried caretStack but i got an error

    • Jason Brownlee June 15, 2016 at 5:54 am #

      Sorry to hear that. What error did you get?

      Did the above example work for you?

  5. Ramasubramaniam June 14, 2016 at 8:51 pm #

    Fantastic tutorial!!. One of the most simple and elegant ways of introducing ensembles. As someone mentioned in earlier comment, this is one of the cutest resource for introducing ensembles.

  6. Mudit June 18, 2016 at 5:07 am #

    models <- caretList(Loan_Status~., data=train1, trControl=control, methodList=algorithmList)

    Throws a error

    Error in { : task 1 failed – "argument is not interpretable as logical"
    In addition: There were 32 warnings (use warnings() to see them)

    any suggestion ?

    • Jason Brownlee June 18, 2016 at 6:39 am #

      It looks like you are adapting the example to your own problem nice work!

      Were you able to reproduce the example in the tutorial first? This will confirm your environment is working correctly.

      I have not seen this error before, but I suspect it is related to your dataset. Confirm the data is in a form to make it easy to model, such as all numeric attributes and a factor being predicted.

      You can ignore the warnings for now.

      • Mudit June 19, 2016 at 2:19 am #

        Hi Jason,

        The environment is fine as i have tried the other codes of your to as well and those run fine……

        Can I send you the dataset i m using on your mail id and you can help me with the code and let me know what mistake i m doing….just to give you a heads up i have also converted all the variables to as.numeric before applying the algorithm but i m still facing the same issue.

        Please if you can help me with your mail id to send the dataset.

        Regards
        Mudit

  7. Mudit June 22, 2016 at 1:41 am #

    Hi Jason,

    I totally respect your time constraint……..its just when I’m stuck with code it get hard to debug…seeing the same code working on other dataset fine….

    Regards
    Mudit

    • Rajagopalan Kannan July 1, 2016 at 2:49 am #

      Hi Mudit,
      I see you are participating in AV Loan default prediction competition. I am also participating in the same in the learning mode and my score (so far) is same as yours. At present i am working on stacking using caretEnsemble, I have not come across the error that you are seeing. Would you be able to provide more info (or) share code.

      Thanks,
      Raj

  8. Komal Sinha July 1, 2016 at 5:27 am #

    I need to do combine 6 different predictive models , this is a 7 class problem.
    Each model predicts output probability for being in all the seven classes (lib SVMs using -b 1 option) and I need to combine them to get a better model. I just have to use the predictions of these models to train the ensemble.
    Can you please tell me how to do the ensemble learning for multiclass problem and how this can be done in matlab ? Please reply as soon as possible. I shall be thankful. Thanks !

    • Jason Brownlee July 1, 2016 at 5:44 am #

      Sorry, I don’t have any examples in Matlab at the moment.

  9. Rajagopalan Kannan July 1, 2016 at 6:28 am #

    Hi Jason,
    I have built many models with different pre-processing done on each model and the model itself is unique tuned by its own tune parameters and i want to stack them now. how do i get them into caretStack, because caretStack requires caretList to be its input. but i have generated models using caret train function. hope you got what i am saying. Appreciate your response.

    Cheers,
    Raj

    • Rajagopalan Kannan July 1, 2016 at 6:44 am #

      Jason, no worries. i figured it out. thanks.

      • Ernest September 13, 2016 at 6:37 am #

        How did you do it? I am currently working on one such problem. Could you advise me on how to go about it. Using a caretlist with different preprocessing, and different variables for each model to build a stacked ensemble.

  10. Seun August 3, 2016 at 3:46 am #

    Hi Jason,
    I am currently trying to construct an ensembles of classifiers using the stacking technique. I am currently using the Naive Bayes and j48 classifier as the base leaarners with Random Forest as the meta learner.Where can I find the correlation values for predictions in Weka?
    Thanks

    • Jason Brownlee August 3, 2016 at 8:21 am #

      Hi Seun, if you are looking to investigate the correlation between prediction errors in Weka, you will have to do this manually. I don’t believe there is facility.

      You will have to make predictions on a test dataset for each classier, save them to CSV files and then calculate the PEARSON() correlation in MS Excel or similar.

      I hope that helps.

      • Seun August 10, 2016 at 10:43 am #

        Hi Jason,Thanks. Thats sounds like a lot of work.

  11. babi September 27, 2016 at 5:48 pm #

    what changes shd be made if i deal with a categorical featurespace with large no of levels. on execution if this code shows an error “Error in train.default(x, y, weights = w, …) : Stopping
    In addition: There were 33 warnings (use warnings() to see them)”..

    • babi September 27, 2016 at 5:49 pm #

      i was talking about executing stacking

    • Jason Brownlee September 28, 2016 at 7:40 am #

      A hard question babi. Try converting the categorical features to binary features and see if that makes a difference.

  12. Narendra Prasad K October 18, 2016 at 5:48 am #

    Thanks a lot Jason.. This article is very helpful to Ensemble the model.

    I tried one of our regression Problem, the RMSE values is increased..

    is there any article to Merge the models other than caret Packages, If it is ok…

    Thanks,
    Narendra Prasad K

    • Jason Brownlee October 18, 2016 at 5:56 am #

      You can merge the predictions from models manually, that is how we had to do it before caretEnsemble.

  13. Siddhesh November 6, 2016 at 6:48 am #

    Can i merge classification model like logistic regression and linear regression? If so which method is to be used?

    • Jason Brownlee November 7, 2016 at 7:11 am #

      Hi Siddhesh,

      Yes, you can create an ensemble by combining the predictions from these models. You can use voting (mode), take the average or use another model to learn how to best combine the predictions (called stacked generalization or stacking).

  14. Ajas November 19, 2016 at 8:37 am #

    Hey Jason,

    The tutorial is awesome, but i am not able to install caretEnsemble pkg for R version 3.3.1, is there any other pkgs available for same task or any workaround.

    Thanks for the help..!..

    • Jason Brownlee November 19, 2016 at 8:55 am #

      I’m sorry to hear that.

      Perhaps you could update to 3.3.2 and try again?

      Perhaps check stack overflow to see if anyone else has your error?

      You may even want to try to R mailing list, if you’re brave.

  15. Surya November 21, 2016 at 6:59 pm #

    models <- caretList(Item_Outlet_Sales~., data=BigMartimp, trControl=trainControl, methodList=algorithmList)
    results <- resamples(models)
    summary(results)
    dotplot(results)
    modelCor(results)
    splom(results)

    From above we found that Decision tree is having low correlation with all other models used in the method list previously.

    So while creating stackedensemble model do we need to remove the rpart and train the model again before doing stacking?

    stack.rf <- caretStack(models, method="rf", metric="RMSE", trControl=stackControl)

    • Jason Brownlee November 22, 2016 at 7:01 am #

      Good question Surya,

      You could create the stacked model yourself. I believe a limitation of caretStack is that it expects to train the submodels itself.

      • Abhi February 21, 2017 at 4:17 pm #

        Do you know where I can find code to implement Stacked Models in R? I am interested in tuning my xgboost models and stacking them with other optimized models. Any pointers towards examples/implementations would be awesome!

        • Jason Brownlee February 22, 2017 at 9:57 am #

          Does the stacking example in this post not help?

          XGBoost is available in R and caret, and can be used from caretEnsemble.

  16. Surya November 21, 2016 at 8:34 pm #

    Let’s say you have got predictions where there are two models which are highly correlated (>0.75). Should we apply the same code as below for the mentioned example in the article?

    # stack using random forest
    set.seed(seed)
    stack.rf <- caretStack(models, method="rf", metric="Accuracy", trControl=stackControl)
    print(stack.rf)

    • Jason Brownlee November 22, 2016 at 7:05 am #

      I would suggest try and see if you can lift the performance by using a stacked model. Even just a few better predictions can help model performance.

    • Surya November 22, 2016 at 6:21 pm #

      Sorry, if you can elaborate on this will be great. How stacking algorithm has to be used when One model is having low correlation with all other models.

      • Jason Brownlee November 23, 2016 at 8:55 am #

        I don’t remember the specifics of the paper, but it was along the lines of better performance when predictions between weak learners are uncorrelated (or maybe it was the errors being uncorrelated). This may have been in PAC learning theory – it has been a while, sorry.

  17. Gunwoo Nam November 30, 2016 at 1:12 pm #

    Hi, thanks for nice introduction to ensemble techniques.

    I have one question. Can I ensemble 1 method repeatedly?
    It seems CaretEnsemble mix different methods.
    What if I want to ensemble a single method many times?(like boosting)

    Function “train” seems to help resampling data and apply method repeatedly. But does it mean bagging or boosting? Or it just helps to pick a best parameter for single method?

    Thanks in advance.

    • Jason Brownlee December 1, 2016 at 7:23 am #

      You can, but may not get any gain Gunwoo.

      You could create 10 neural nets and take the mean prediction – this would probably outperform a stacking approach.

      You can put other methods inside bagging (I don’t remember the package name off hand), but bagging works so much better with high variance methods – like unpruned trees or low-k in KNN with subsampled training data, etc.

      At the end of the day, try different methods and see what gives the best results on your problem.

  18. Anh Bui December 6, 2016 at 1:25 am #

    Thanks for your writting!
    Can I ask you a question?
    What is the maximum model can combine in this package?

    • Jason Brownlee December 6, 2016 at 8:27 am #

      I don’t know Anh. Perhaps it is limited by memory and CPU power.

      • Anh Bui December 6, 2016 at 8:58 pm #

        I got it. Thank you so much

  19. Jay Hyunwoo Jeong December 19, 2016 at 1:53 am #

    This post is the most useful job that I’ve seen about ensemble modeling!!
    I have a question.

    below sentences seem to show correlations between resampled accuracy measures of each models,
    not correlations between predictions.

    # correlation between results
    modelCor(results)
    splom(results)

    correlations between accuracy ratios can be interpreted as them between predictions
    by each model ?

  20. Aadi January 5, 2017 at 8:54 pm #

    Hello Jason, first l would like to thank you for such a nice article of ensemble methods. I have a question that may be naive but i m confused,
    My question is that since ensemble methods are used on a base classifer model (naive base classifier, SVM etc.) to improve accuracy of the base model but how to choose on which classifier we should apply ensemble methods.
    How do we know that wihch classifier is best for applying ensemble methods?

    • Jason Brownlee January 6, 2017 at 9:09 am #

      Great question Aadi.

      Generally, we cannot know before hand. Use trial and error.

      Perhaps we can ensemble a suite of well-performing models in the case of stacked generalization.

      Perhaps we can ensemble a suite of well performing high variance models in the case of bagging.

      It is really problem specific. This post will shed more light on the open problem of algorithm selection:
      http://machinelearningmastery.com/a-data-driven-approach-to-machine-learning/

  21. Carl Turner January 26, 2017 at 7:58 am #

    Best explanation of ensembles I’ve seen. Thanks for posting this

  22. shikha March 23, 2017 at 4:00 pm #

    we need to combine c5.0 and random forest for our dataset can we do it and how ?

    • Jason Brownlee March 24, 2017 at 7:52 am #

      Yes, the tutorial above should help. Sorry I cannot write the code for you.

Leave a Reply