Boosting and AdaBoost for Machine Learning

Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers.

In this post you will discover the AdaBoost Ensemble method for machine learning. After reading this post, you will know:

  • What the boosting ensemble method is and generally how it works.
  • How to learn to boost decision trees using the AdaBoost algorithm.
  • How to make predictions using the learned AdaBoost model.
  • How to best prepare your data for use with the AdaBoost algorithm

This post was written for developers and assumes no background in statistics or mathematics. The post focuses on how the algorithm works and how to use it for predictive modeling problems. If you have any questions, leave a comment and I will do my best to answer.

Kick-start your project with my new book Master Machine Learning Algorithms, including step-by-step tutorials and the Excel Spreadsheet files for all examples.

Let’s get started.

Boosting and AdaBoost for Machine Learning

Boosting and AdaBoost for Machine Learning
Photo by KatieThebeau, some rights reserved.

Boosting Ensemble Method

Boosting is a general ensemble method that creates a strong classifier from a number of weak classifiers.

This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.

AdaBoost was the first really successful boosting algorithm developed for binary classification. It is the best starting point for understanding boosting.

Modern boosting methods build on AdaBoost, most notably stochastic gradient boosting machines.

Get your FREE Algorithms Mind Map

Machine Learning Algorithms Mind Map

Sample of the handy machine learning algorithms mind map.

I've created a handy mind map of 60+ algorithms organized by type.

Download it, print it and use it. 


Also get exclusive access to the machine learning algorithms email mini-course.

 

 

Learning An AdaBoost Model From Data

AdaBoost is best used to boost the performance of decision trees on binary classification problems.

AdaBoost was originally called AdaBoost.M1 by the authors of the technique Freund and Schapire. More recently it may be referred to as discrete AdaBoost because it is used for classification rather than regression.

AdaBoost can be used to boost the performance of any machine learning algorithm. It is best used with weak learners. These are models that achieve accuracy just above random chance on a classification problem.

The most suited and therefore most common algorithm used with AdaBoost are decision trees with one level. Because these trees are so short and only contain one decision for classification, they are often called decision stumps.

Each instance in the training dataset is weighted. The initial weight is set to:

weight(xi) = 1/n

Where xi is the i’th training instance and n is the number of training instances.

How To Train One Model

A weak classifier (decision stump) is prepared on the training data using the weighted samples. Only binary (two-class) classification problems are supported, so each decision stump makes one decision on one input variable and outputs a +1.0 or -1.0 value for the first or second class value.

The misclassification rate is calculated for the trained model. Traditionally, this is calculated as:

error = (correct – N) / N

Where error is the misclassification rate, correct are the number of training instance predicted correctly by the model and N is the total number of training instances. For example, if the model predicted 78 of 100 training instances correctly the error or misclassification rate would be (78-100)/100 or 0.22.

This is modified to use the weighting of the training instances:

error = sum(w(i) * terror(i)) / sum(w)

Which is the weighted sum of the misclassification rate, where w is the weight for training instance i and terror is the prediction error for training instance i which is 1 if misclassified and 0 if correctly classified.

For example, if we had 3 training instances with the weights 0.01, 0.5 and 0.2. The predicted values were -1, -1 and -1, and the actual output variables in the instances were -1, 1 and -1, then the terrors would be 0, 1, and 0. The misclassification rate would be calculated as:

error = (0.01*0 + 0.5*1 + 0.2*0) / (0.01 + 0.5 + 0.2)

or

error = 0.704

A stage value is calculated for the trained model which provides a weighting for any predictions that the model makes. The stage value for a trained model is calculated as follows:

stage = ln((1-error) / error)

Where stage is the stage value used to weight predictions from the model, ln() is the natural logarithm and error is the misclassification error for the model. The effect of the stage weight is that more accurate models have more weight or contribution to the final prediction.

The training weights are updated giving more weight to incorrectly predicted instances, and less weight to correctly predicted instances.

For example, the weight of one training instance (w) is updated using:

w = w * exp(stage * terror)

Where w is the weight for a specific training instance, exp() is the numerical constant e or Euler’s number raised to a power, stage is the misclassification rate for the weak classifier and terror is the error the weak classifier made predicting the output variable for the training instance, evaluated as:

terror = 0 if(y == p), otherwise 1

Where y is the output variable for the training instance and p is the prediction from the weak learner.

This has the effect of not changing the weight if the training instance was classified correctly and making the weight slightly larger if the weak learner misclassified the instance.

AdaBoost Ensemble

Weak models are added sequentially, trained using the weighted training data.

The process continues until a pre-set number of weak learners have been created (a user parameter) or no further improvement can be made on the training dataset.

Once completed, you are left with a pool of weak learners each with a stage value.

Making Predictions with AdaBoost

Predictions are made by calculating the weighted average of the weak classifiers.

For a new input instance, each weak learner calculates a predicted value as either +1.0 or -1.0. The predicted values are weighted by each weak learners stage value. The prediction for the ensemble model is taken as a the sum of the weighted predictions. If the sum is positive, then the first class is predicted, if negative the second class is predicted.

For example, 5 weak classifiers may predict the values 1.0, 1.0, -1.0, 1.0, -1.0. From a majority vote, it looks like the model will predict a value of 1.0 or the first class. These same 5 weak classifiers may have the stage values 0.2, 0.5, 0.8, 0.2 and 0.9 respectively. Calculating the weighted sum of these predictions results in an output of -0.8, which would be an ensemble prediction of -1.0 or the second class.

Data Preparation for AdaBoost

This section lists some heuristics for best preparing your data for AdaBoost.

  • Quality Data: Because the ensemble method continues to attempt to correct misclassifications in the training data, you need to be careful that the training data is of a high-quality.
  • Outliers: Outliers will force the ensemble down the rabbit hole of working hard to correct for cases that are unrealistic. These could be removed from the training dataset.
  • Noisy Data: Noisy data, specifically noise in the output variable can be problematic. If possible, attempt to isolate and clean these from your training dataset.

Further Reading

Below are some machine learning texts that describe AdaBoost from a machine learning perspective.

Below are some seminal and good overview research articles on the method that may be useful if you are looking to dive deeper into the theoretical underpinnings of the method:

Summary

In this post you discovered the Boosting ensemble method for machine learning. You learned about:

  • Boosting and how it is a general technique that keeps adding weak learners to correct classification errors.
  • AdaBoost as the first successful boosting algorithm for binary classification problems.
  • Learning the AdaBoost model by weighting training instances and the weak learners themselves.
  • Predicting with AdaBoost by weighting predictions from weak learners.
  • Where to look for more theoretical background on the AdaBoost algorithm.

If you have any questions about this post or the Boosting or the AdaBoost algorithm ask in the comments and I will do my best to answer.

Discover How Machine Learning Algorithms Work!

Mater Machine Learning Algorithms

See How Algorithms Work in Minutes

...with just arithmetic and simple examples

Discover how in my new Ebook:
Master Machine Learning Algorithms

It covers explanations and examples of 10 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more...

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

See What's Inside

104 Responses to Boosting and AdaBoost for Machine Learning

  1. Avatar
    Sagar Giri July 26, 2016 at 5:49 am #

    Thank You! The article was really helpful to understand the AdaBoost Algorithm.
    In the article, you said, “AdaBoost is best used to boost the performance of decision trees on binary classification problems.” what does that mean?

    Can’t this algorithm be used in non-binary classification problems like “Fruit Recognition System” where training set contains feature matrix and associated name of different fruit. I need to know this for my research project.

    • Avatar
      Jason Brownlee July 26, 2016 at 6:00 am #

      AdaBoost was designed for binary classification (two output classes) and makes use of decision trees.

      If you think the algorithm can be used on your problem, give it a shot and see if it works.

      • Avatar
        Sagar Giri July 26, 2016 at 6:31 am #

        So, choosing of machine learning algorithm is heuristic? That, I should keep on implementing different algorithms and choose the best one that fits into my problem and gives the best accuracy ?

        • Avatar
          Jason Brownlee July 26, 2016 at 8:03 am #

          Choosing the best algorithm and even the best representation of your problem is problem specific. You must discover the best combination empirically through experimentation.

  2. Avatar
    Jessica August 23, 2016 at 2:22 am #

    This article is very helpful !
    I have some questions about adaboost. First, every weak learner or classifier in adaboost is decision tree based, can other algorithms like KNN or SVM be the basic components of the ensemble learning? My second question is, how adaboost algorithm deal with large dynamic sequential dataset such as global weather data or sensor dataset?

    Thank you very much!

    • Avatar
      Jason Brownlee August 23, 2016 at 5:53 am #

      Hi Jessica. I guess other methods could be used, but traditionally the weak learner is one-level decision trees (decision stumps). A neural net might work if it was “weak”, as in only had one layer and maybe very few neurons (perhaps just one). It might be an interesting experiment, and I bet there is some literature on this if you’d like to check http://scholar.google.com

      I am not familiar with adaboost being used on time series. It may be possible, but I have not seen it. I expect large changes to the method may be required.

    • Avatar
      amar March 5, 2018 at 11:24 pm #

      hi JESSICA according to your question about the ADABOOST algorithm that is coincidently one of y questions about this algorithm ,and due to the fact you mentionned datasets on global weather data or sensor datasets such as channel of deferent radiometers and meteosats can i ask you are you doing research in the field of meteorology or precipitation and climate estimation from meteosat data ? because thats the field of my research study now and i have the same question about the possibility of use of other weak classifiers such as SVM ANN or instead of DECISION STUMPS , ? , perhaps mr jason BROWNLEE can help us with

      • Avatar
        Jason Brownlee March 6, 2018 at 6:14 am #

        Decision stumps are preferred in adaboost because they are weak learners.

        You could use other weak learners in theory, such as knn with a small sample or small k value.

    • Avatar
      Deependra October 14, 2021 at 4:38 am #

      How the stage values are decided while prediction of a new instance, does it uses the training stage values?

  3. Avatar

    Shouldn’t the error formula be

    error = (N – correct) / N ?

    Or otherwise you would get a negative misclassification error rate.

    • Avatar
      Niels January 27, 2017 at 8:42 pm #

      Obviously 😉

  4. Avatar
    Prashant Nemade November 2, 2016 at 4:51 pm #

    Hi Jason, Thank you for the detailed clarification on adaboost algorithm. I have a question on this. How training weights are being used in this adaboost algorithm (meaning does adaboost algorithm repeat observations basis weights while building model or is it being used in some different way)?

    • Avatar
      Jason Brownlee November 3, 2016 at 7:55 am #

      Hi Prashant, the weights are used during the construction of subsequent decision stumps.

  5. Avatar
    Shreejit April 3, 2017 at 10:05 am #

    Hi Jason,

    We would have a bunch of weights and models in the end. How is the final prediction decided for the example?

  6. Avatar
    Ntate Ndaba April 10, 2017 at 10:26 pm #

    Thanks for this Jason. It’s very helpful.

  7. Avatar
    Ntate Ndaba April 13, 2017 at 12:41 am #

    Hi Jason,

    I was wondering if there any rule for alphas (learner weights) in H(x) = sign(Sum(alpha*h_i(x)) for h_i = 1…n. For example, the sum of alphas must be equal to 1, the value of each alpha can’t be > 1, etc.

    • Avatar
      Jason Brownlee April 13, 2017 at 10:02 am #

      No that I recall offhand. If you discover something in the literature, let me know.

      • Avatar
        Ntate Ndaba April 13, 2017 at 6:54 pm #

        Thank you. Will do.

  8. Avatar
    Yi June 22, 2017 at 12:28 am #

    Given the stage = ln((1-error) / error), the stage can be negative value if the error > 0.5. Since the stage will be used as the weight of the model, can it be a negative value?

    • Avatar
      Yi June 22, 2017 at 10:31 am #

      I guess I found my answer and share it here:

      The stage value can be negative when the error rate > 0.5. In this case, the weight of the weak classifier become nagative, which means we flip its predication so that what the weak classifier says true become false and vice versa. In term of the misclassified data points from the weak classifier, since we flip the sign of the weak classifier, those data points are actually classified correctly. As compensation, we reduce their weight by w*exp(negative value)….

  9. Avatar
    Lalit June 29, 2017 at 1:20 am #

    Hi,
    At first you said :-
    Each instance in the training dataset is weighted. weight(xi) = 1/n
    So every instance will have same weight , right ?

    then in example , you said :-
    “For example, if we had 3 training instances with the weights 0.01, 0.5 and 0.2.”

    How come above 3 instances have different weights and not same weights from (1/n formula = .33,.33,.33)

    • Avatar
      Ali July 28, 2017 at 1:22 am #

      As Jason mentions in the article, the weights get updated based on w = w * exp(stage * terror) before the next classifier is applied.

  10. Avatar
    Julia June 29, 2017 at 6:25 am #

    This article is really helpful! I searched a lot for an introduction to Adaboost. This is the best!

  11. Avatar
    Hossein Sayadi August 4, 2017 at 2:24 am #

    Hi Jason,
    Thanks for the perfect description. I have a question! Is Adaboost or generally boosting methods for improving the accuracy of weak learners from on type? I mean, if we have different types of learners that are not doing accurately, can we use boosting for them and should? Basically the question is that in this case, should we apply boosting on each learning type individually or we should consider all the weak learner (even from different types) together and apply one boosting ?

    Thanks,

    • Avatar
      Jason Brownlee August 4, 2017 at 7:01 am #

      It was designed for decision trees (decision stumps).

      You can try on other models if you like.

    • Avatar
      yusuf December 4, 2019 at 9:42 pm #

      If the weak classifiers do not have much consensus about wrong predictions you can.

      For example, you have 3 classifiers c1, c2, c3 and all of them classifies a sample wrongly. So there is no way for you to make a correct classification.

      On the other hand, if at least one of them makes a correct classification for a sample, you might correctly classify the sample.

      I think that’s the main reason people want weak classifiers. If you use let’s say 3 strong classifiers there is more chance that there are samples which are wrongly classified in all 3 classifiers.

  12. Avatar
    Kevin October 30, 2017 at 8:12 am #

    Hi Jason,
    That is a nice explanation, thanks. How exactly do the weights affect the tree learning algorithm? Do they feed directly into the entropy calculations for information gain? I assume they change the way the probabilities are calculated, but I can’t find a good explanation of how.

    • Avatar
      Jason Brownlee October 30, 2017 at 3:48 pm #

      Exactly, the weights influence the splitting criteria used to select the split point of subsequent weak learners. It’s a way of weighting the model away from “easy” parts of the problem and toward “harder” parts, as defined by the parameter values for split points.

  13. Avatar
    hobart November 1, 2017 at 5:09 pm #

    When I try to practice Adaboost, I found I am still not clear about “how to build a new weak classifier by weighted training data”. You mentioned weights are used in computing errors, but let me say we use ID3 or CART, they are not error driven. How do we use weighted training data to build a new ID3/CART?

    Thanks!
    (P.S. I found myself visit this place more and more often. Many thanks for very helpful information.)

    • Avatar
      Jason Brownlee November 2, 2017 at 5:07 am #

      Unlike CART and ID#, boosting will create multiple trees consecutively. Each tree is built on a weighted form of the dataset to ensure that the tree focuses on areas where the previous tree/trees performed poorly. The subsequent trees try to correct the errors of the prior trees.

      I hope that helps.

      • Avatar
        Hobart November 2, 2017 at 3:28 pm #

        Not really ~~~
        question 1: How do we build CART stump or ID# stump (in adaboost)? Or they are not option?
        question 2: I see some examples use very simple formula like (x5.5) as a stumps, they do use weight to measure error, but is this the only thing we can use in adaboost?

        • Avatar
          Jason Brownlee November 2, 2017 at 3:59 pm #

          You can build a stump with one split point. The weight can be used to influence the choice of split point.

          Yes, this is the Adaboost algorithm.

          I do have a worked example in a spreadsheet in my “master machine learning algorithms” book.

          • Avatar
            Hobart November 2, 2017 at 4:25 pm #

            Many thanks Jason!
            I was thinking a typical adaboost shall use some “advanced” algs as stumps. For example if I try to use adaboost in kaggle. Your guidence saves me a lot time.

  14. Avatar
    San December 30, 2017 at 1:59 am #

    Hello,
    is there algorithm in ensemble can work in similarity?

    • Avatar
      Jason Brownlee December 30, 2017 at 5:25 am #

      You could create an ensemble of similarity algorithms, sure.

  15. Avatar
    Mohammad December 30, 2017 at 2:42 am #

    Hello,
    is there algorithm in ensemble techniques work as similarity for predict based on specific rules?

  16. Avatar
    Mohammad December 30, 2017 at 9:37 am #

    Attribute 1 Attribute 2 Attribute 3 Class label
    Class1 4 5 5
    Class 2 2 3 1
    Class 3 4 5 5
    Class 4 1 3 5
    Class5 2 3 1
    Class 6 4 5 5

    now class label the value of it 1 or 0
    1 if the vale of the attributes is the same, 0 if no one have same value
    ex. class 1 and class 3 and class 6 are same
    class label value is 1
    can i use ensemble techniques to get this results ?

    • Avatar
      Jason Brownlee December 31, 2017 at 5:20 am #

      Sorry, I don’t follow, perhaps can you explain your question a different way?

  17. Avatar
    Haebi April 6, 2018 at 12:26 pm #

    Hi Jason, shouldn’t the terror be -1 or 1? Since in the update rule, (W = W * exp(stage * y_i * h_t)). if the y_i is -1, and h_t is -1 (thus correct classification), their multiplication would yield +1, while an incorrect classification would lead to -1?

    This article seems to suggest the same (https://towardsdatascience.com/boosting-algorithm-adaboost-b6737a9ee60c), when it says: “If a misclassified case is from a positive weighted classifier, the “exp” term in the numerator would be always larger than 1 (y*f is always -1, theta_m is positive).”

    In manual calculation, setting terror to 0 for correct classification instead of 1 would lower the total exp/euler’s value, while setting terror to 1 for incorrect classification instead of -1 would conversely raise the total euler’s value, which is not what we want. Am I missing something here?

  18. Avatar
    Brenda April 8, 2018 at 10:53 am #

    hey Jason, have you an example for AdaBoost M1?, and thank you for your article in my research for AdaBoost this is my favorite 🙂

  19. Avatar
    Arun Athavale April 11, 2018 at 9:05 am #

    I am competing in challenge for binary classification problem. Data is 250k for training. I need to produce probabilities. I find every time I run probabilities change. That makes it problematic as I cannot have results that I reproduce. Any thoughts

  20. Avatar
    reys April 16, 2018 at 5:40 pm #

    Can u suggest how we can implement adaboost on SVR predicted model?

    • Avatar
      reys April 16, 2018 at 5:42 pm #

      ValueError Traceback (most recent call last)
      in ()
      9 print(X_test)
      10 #print(y_test)
      —> 11 er_tree = generic_clf(y_train, X_train, y_test, X_test, clf_tree)

      in generic_clf(y_train, X_train_scaled, y_test, X_test_scaled, clf)
      9 print(pred_train)
      10 print(pred_test)
      —> 11 clf.fit(X_train_scaled,y_train)
      12 pred_train = clf.predict(X_train_scaled)
      13 pred_test = clf.predict(X_test_scaled)

      ~/anaconda3/lib/python3.6/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
      788 sample_weight=sample_weight,
      789 check_input=check_input,
      –> 790 X_idx_sorted=X_idx_sorted)
      791 return self
      792

      ~/anaconda3/lib/python3.6/site-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
      138
      139 if is_classification:
      –> 140 check_classification_targets(y)
      141 y = np.copy(y)
      142

      ~/anaconda3/lib/python3.6/site-packages/sklearn/utils/multiclass.py in check_classification_targets(y)
      170 if y_type not in [‘binary’, ‘multiclass’, ‘multiclass-multioutput’,
      171 ‘multilabel-indicator’, ‘multilabel-sequences’]:
      –> 172 raise ValueError(“Unknown label type: %r” % y_type)
      173
      174

      ValueError: Unknown label type: ‘continuous’

      my code showing error like this……..

      • Avatar
        Jason Brownlee April 17, 2018 at 5:55 am #

        I’m sorry to hear that.

        What code are you trying to run?

    • Avatar
      Jason Brownlee April 17, 2018 at 5:54 am #

      Not off hand, sorry, some careful development would be required. Also, I don’t think SVR would be a good fit as a weak model in adaboost.

  21. Avatar
    Tonoy Biswas June 2, 2018 at 5:05 am #

    It is very helpful Jason for my masters thesis write up. Thanks for your contribution.

  22. Avatar
    sana July 30, 2018 at 4:47 am #

    sir please provide code for adaboost and random forest ensemble classfiiers

  23. Avatar
    Ntate Ndaba August 28, 2018 at 6:06 am #

    Hi Jason. I’m back to this question after almost 2 years. I was wondering, how do you ‘select’ an instance to be part of the training set using their weights (probabilities). Do you sort the instances based on their weights and then only take those with higher weight as part of your next training set (for the next classifier) ?.

    • Avatar
      Ntate Ndaba August 28, 2018 at 7:39 am #

      Ok. I verified if the answer I had was correct. I used a simple technique for weighted random number generator without replacement.

  24. Avatar
    Kamohelo August 29, 2018 at 12:54 am #

    Hi Jason

    How do you decide to split the training set when you select examples with high probability. Do you make the selected sample from the training set be of the same size as the training set by putting back to the training set each example that has a higher probability? or you only take a subset of the training set (e.g. take only the top 70% examples for training) then classify the whole training set afterwards to get the training error and update their weights ?.

    • Avatar
      Jason Brownlee August 29, 2018 at 8:12 am #

      It really depends on the problem.

      Your goal is to have a train and test set that are representative of the broader dataset.

  25. Avatar
    Kamohelo August 29, 2018 at 1:47 am #

    Just to update my question:

    The algorithm states that the instances in the training set are weighted each time they are correctly classified or misclassified to ensure that subsequent learners focus on instances that are hard to classify. My question is, how many of those hard to classify instances from the training set are selected to train a weak classifier. Do you take 70%/50%/30%, etc. of those hard to classify instances and use them to train the weak learner in the next iteration? or do you use the entire weighted training set. If the answer is the latter, how does the weak learner utilize instance weights so that it can focus on those that have a bigger weight value ?.

  26. Avatar
    Sree September 11, 2018 at 5:29 am #

    Hi Jason should not there be a 0.5 multiplied to the stage value? I find the same formula with 0.5 multiplied with the log value

    • Avatar
      Jason Brownlee September 11, 2018 at 6:33 am #

      Perhaps it would be a bad idea to mix and match formula. Pick one approach.

  27. Avatar
    Ankita Tulshyan October 16, 2018 at 9:40 pm #

    Hi Jason,
    Thanks a lot for the explanation with such simplicity.

    I have a question,
    1) In terms of the stability of the predictions of the model, how would you compare AdaBoost with XGBoost and random forest, I understand these are different implementations of decision tree, please correct me if I’m wrong.

    Thanks.

    • Avatar
      Jason Brownlee October 17, 2018 at 6:52 am #

      They are all ensemble methods and use decision trees.

      Comparison depends on the data you have available. Perhaps repeated cross validation would be a good start?

      • Avatar
        Ankita Tulshyan November 15, 2018 at 5:58 pm #

        Thanks, will try that.

  28. Avatar
    KiddingMe December 15, 2018 at 6:52 pm #

    Hi Jason, thanks a lot for sharing the knowledge!

    A question about stage value and making predictions. Since stage = ln((1-error) / error), and in reality we could get a predictor that has error > 0.5 and hence stage < 0. In this case, the predictor is not only "weak" but also "bad" (because it is even worse than a random guess if the training dataset is balanced.) Should we include this predictor in the prediction vote and assign it a negative stage to ask it contributing negatively to the final score(imagine we have a "horrible" predictor with error = 1, then it is perfect in the sense that we can make opposite predictions based on it), or just simply discard it because it does not belong to the group of predictors "slightly better than random guess"?

    • Avatar
      Jason Brownlee December 16, 2018 at 5:21 am #

      Yes, include them all, you don’t really want to introduce a bias into the process.

  29. Avatar
    joe December 16, 2018 at 5:33 am #

    Hi, how can you make adaboost from scratch using python?

  30. Avatar
    Sarah January 28, 2019 at 12:34 am #

    How is it used for detecting frauds specially in credit card transactions

  31. Avatar
    shobha March 30, 2019 at 7:27 pm #

    Does the imbalanced data affect boosting algorithm performance?

    • Avatar
      Jason Brownlee March 31, 2019 at 9:28 am #

      It may. You must choose your metrics for model evaluation carefully.

  32. Avatar
    Rian April 5, 2019 at 9:34 am #

    Hi, thank you for the explanation. I want to ask to determine the e_t we us D or Dt?

    D is training data and Dt is sample data based on weight.

    • Avatar
      Jason Brownlee April 5, 2019 at 2:00 pm #

      Sorry, I don’t know what you are referring to?

      • Avatar
        Rian April 6, 2019 at 2:02 pm #

        When we calculate the error rate, we use whole training data or sample data that we use to train our model? And also what happen when the weighted sum of predictions equals to 0?

        • Avatar
          Jason Brownlee April 7, 2019 at 5:25 am #

          Yes, the whole training dataset.

          • Avatar
            Rian April 9, 2019 at 12:14 pm #

            What happen when the weighted sum of predictions equals to 0?

  33. Avatar
    John June 5, 2019 at 9:34 pm #

    Hi, I have a question. AdaBoost algorithm with any type of weak classifiers or combination of some weak classifiers, eventually reach to zero training error?

    • Avatar
      Jason Brownlee June 6, 2019 at 6:29 am #

      Probably, but it will have overfit the training dat.

      All models will have some random error that we cannot (and do not want to) capture.

  34. Avatar
    Pratik Anand October 4, 2019 at 3:05 am #

    What do you really mean by more weight while training a new model?

    I think more weight means the probability of choosing that instance is high while sampling for training the model.
    if above is correct then is it really make any difference if you give the same instances many times to model while training.

    • Avatar
      Jason Brownlee October 4, 2019 at 5:48 am #

      As in a weighting of the input, a coefficient applied to the input to increase or decrease the importance in the decision making.

  35. Avatar
    Jesse April 19, 2020 at 1:37 am #

    Even after 4 years, there still isn’t another article that is as simple and clear as this.

  36. Avatar
    Aditya Pradhan January 4, 2021 at 8:55 pm #

    Hi I am unable to understand how the weights on the correctly classified data are reduced? The terror is zero for them so the exponential term becomes and updated w = w. What am I missing?

    • Avatar
      Jason Brownlee January 5, 2021 at 6:22 am #

      Perhaps the resources in the further reading section will provide a helpful alternate description for you?

  37. Avatar
    Julian January 17, 2021 at 1:46 pm #

    Jason, hi!

    There is something that’s been bugging me for a while. Is it possible for the algorithm to fall into some kind of loop in which one classifier misclassifies some points, and the next one corrects these points, but now missclasifies the ones that were previously well classified? Or

    • Avatar
      Jason Brownlee January 18, 2021 at 6:04 am #

      Good question.

      Perhaps. This might be an interesting aspect of the algorithm to investigate.

  38. Avatar
    Data Quest April 17, 2021 at 12:57 pm #

    Hi Jason , plz tell what is the use of misclassification rate?

    • Avatar
      Jason Brownlee April 18, 2021 at 5:51 am #

      It gives you an idea of how well (or not) your model is performing.

  39. Avatar
    pritam June 23, 2021 at 5:37 pm #

    Hi,

    As i am doing research related to the air sea interaction.

    If we don’t know the relation between the two or more parameters, can I used adaboost technique to find the teleconnection between the two parameter, like how the two parameters are related to each other?

    or

    SOM will be useful for such scenario.

    • Avatar
      Jason Brownlee June 24, 2021 at 6:00 am #

      Perhaps prototype a model and compare the results to other methods.

  40. Avatar
    waleed August 7, 2021 at 10:22 pm #

    What do these abbreviations HLSTA and HLSPA mean?

  41. Avatar
    Aayush Shah September 30, 2021 at 6:48 pm #

    In the stage formulae, ln((1 – error) / error) if the calculation inside ln() comes negative, it gives the nan as result. Is that correct? Can we use abs() like this → ln(abs((1 – error) / error)) ?

    Thanks for an amazing article!

    • Adrian Tam
      Adrian Tam October 1, 2021 at 12:38 pm #

      Correct. Error in this case should be less than 1 always. If not, your function can prevent NaN but it also means your function is not monotonic to the error. This is usually not what we wanted.

  42. Avatar
    Random person October 14, 2022 at 3:36 pm #

    For error calculation you’ve mentioned error = (correct-N)/N but this quantity will be negative in most cases. Shouldn’t error be = (N-correct)/N

    • Avatar
      James Carmichael October 15, 2022 at 6:06 am #

      Thank you for your feedback! The absolute value of the error should be considered.

  43. Avatar
    Greg February 1, 2023 at 1:26 pm #

    Is this the same as logitboost?

Leave a Reply