A Gentle Introduction to the Bootstrap Method

By Jason Brownlee on August 8, 2019 in Statistics 106

The bootstrap method is a resampling technique used to estimate statistics on a population by sampling a dataset with replacement.

It can be used to estimate summary statistics such as the mean or standard deviation. It is used in applied machine learning to estimate the skill of machine learning models when making predictions on data not included in the training data.

A desirable property of the results from estimating machine learning model skill is that the estimated skill can be presented with confidence intervals, a feature not readily available with other methods such as cross-validation.

In this tutorial, you will discover the bootstrap resampling method for estimating the skill of machine learning models on unseen data.

After completing this tutorial, you will know:

The bootstrap method involves iteratively resampling a dataset with replacement.
That when using the bootstrap you must choose the size of the sample and the number of repeats.
The scikit-learn provides a function that you can use to resample a dataset for the bootstrap method.

Kick-start your project with my new book Statistics for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Gentle Introduction to the Bootstrap Method
Photo by john mcsporran, some rights reserved.

Tutorial Overview

This tutorial is divided into 4 parts; they are:

Bootstrap Method
Configuration of the Bootstrap
Worked Example
Bootstrap API

Need help with Statistics for Machine Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Bootstrap Method

The bootstrap method is a statistical technique for estimating quantities about a population by averaging estimates from multiple small data samples.

Importantly, samples are constructed by drawing observations from a large data sample one at a time and returning them to the data sample after they have been chosen. This allows a given observation to be included in a given small sample more than once. This approach to sampling is called sampling with replacement.

The process for building one sample can be summarized as follows:

Choose the size of the sample.
While the size of the sample is less than the chosen size
1. Randomly select an observation from the dataset
2. Add it to the sample

The bootstrap method can be used to estimate a quantity of a population. This is done by repeatedly taking small samples, calculating the statistic, and taking the average of the calculated statistics. We can summarize this procedure as follows:

Choose a number of bootstrap samples to perform
Choose a sample size
For each bootstrap sample
1. Draw a sample with replacement with the chosen size
2. Calculate the statistic on the sample
Calculate the mean of the calculated sample statistics.

The procedure can also be used to estimate the skill of a machine learning model.

The bootstrap is a widely applicable and extremely powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method.

— Page 187, An Introduction to Statistical Learning, 2013.

This is done by training the model on the sample and evaluating the skill of the model on those samples not included in the sample. These samples not included in a given sample are called the out-of-bag samples, or OOB for short.

This procedure of using the bootstrap method to estimate the skill of the model can be summarized as follows:

Choose a number of bootstrap samples to perform
Choose a sample size
For each bootstrap sample
1. Draw a sample with replacement with the chosen size
2. Fit a model on the data sample
3. Estimate the skill of the model on the out-of-bag sample.
Calculate the mean of the sample of model skill estimates.

The samples not selected are usually referred to as the “out-of-bag” samples. For a given iteration of bootstrap resampling, a model is built on the selected samples and is used to predict the out-of-bag samples.

— Page 72, Applied Predictive Modeling, 2013.

Importantly, any data preparation prior to fitting the model or tuning of the hyperparameter of the model must occur within the for-loop on the data sample. This is to avoid data leakage where knowledge of the test dataset is used to improve the model. This, in turn, can result in an optimistic estimate of the model skill.

A useful feature of the bootstrap method is that the resulting sample of estimations often forms a Gaussian distribution. In additional to summarizing this distribution with a central tendency, measures of variance can be given, such as standard deviation and standard error. Further, a confidence interval can be calculated and used to bound the presented estimate. This is useful when presenting the estimated skill of a machine learning model.

Configuration of the Bootstrap

There are two parameters that must be chosen when performing the bootstrap: the size of the sample and the number of repetitions of the procedure to perform.

Sample Size

In machine learning, it is common to use a sample size that is the same as the original dataset.

The bootstrap sample is the same size as the original dataset. As a result, some samples will be represented multiple times in the bootstrap sample while others will not be selected at all.

— Page 72, Applied Predictive Modeling, 2013.

If the dataset is enormous and computational efficiency is an issue, smaller samples can be used, such as 50% or 80% of the size of the dataset.

Repetitions

The number of repetitions must be large enough to ensure that meaningful statistics, such as the mean, standard deviation, and standard error can be calculated on the sample.

A minimum might be 20 or 30 repetitions. Smaller values can be used will further add variance to the statistics calculated on the sample of estimated values.

Ideally, the sample of estimates would be as large as possible given the time resources, with hundreds or thousands of repeats.

Worked Example

We can make the bootstrap procedure concrete with a small worked example. We will work through one iteration of the procedure.

Imagine we have a dataset with 6 observations:

[0.1, 0.2, 0.3, 0.4, 0.5, 0.6]

1	[0.1, 0.2, 0.3, 0.4, 0.5, 0.6]

The first step is to choose the size of the sample. Here, we will use 4.

Next, we must randomly choose the first observation from the dataset. Let’s choose 0.2.

sample = [0.2]

1	sample = [0.2]

This observation is returned to the dataset and we repeat this step 3 more times.

sample = [0.2, 0.1, 0.2, 0.6]

1	sample = [0.2, 0.1, 0.2, 0.6]

We now have our data sample. The example purposefully demonstrates that the same value can appear zero, one or more times in the sample. Here the observation 0.2 appears twice.

An estimate can then be calculated on the drawn sample.

statistic = calculation([0.2, 0.1, 0.2, 0.6])

1	statistic = calculation([0.2, 0.1, 0.2, 0.6])

Those observations not chosen for the sample may be used as out of sample observations.

oob = [0.3, 0.4, 0.5]

1	oob = [0.3, 0.4, 0.5]

In the case of evaluating a machine learning model, the model is fit on the drawn sample and evaluated on the out-of-bag sample.

train = [0.2, 0.1, 0.2, 0.6]
test = [0.3, 0.4, 0.5]
model = fit(train)
statistic = evaluate(model, test)

train = [0.2, 0.1, 0.2, 0.6]

test = [0.3, 0.4, 0.5]

model = fit(train)

statistic = evaluate(model, test)

That concludes one repeat of the procedure. It can be repeated 30 or more times to give a sample of calculated statistics.

statistics = [...]

1	statistics = [...]

This sample of statistics can then be summarized by calculating a mean, standard deviation, or other summary values to give a final usable estimate of the statistic.

estimate = mean([...])

1	estimate = mean([...])

Bootstrap API

We do not have to implement the bootstrap method manually. The scikit-learn library provides an implementation that will create a single bootstrap sample of a dataset.

The resample() scikit-learn function can be used. It takes as arguments the data array, whether or not to sample with replacement, the size of the sample, and the seed for the pseudorandom number generator used prior to the sampling.

For example, we can create a bootstrap that creates a sample with replacement with 4 observations and uses a value of 1 for the pseudorandom number generator.

boot = resample(data, replace=True, n_samples=4, random_state=1)

1	boot = resample(data, replace=True, n_samples=4, random_state=1)

Unfortunately, the API does not include any mechanism to easily gather the out-of-bag observations that could be used as a test set to evaluate a fit model.

At least in the univariate case we can gather the out-of-bag observations using a simple Python list comprehension.

# out of bag observations
oob = [x for x in data if x not in boot]

1 2	# out of bag observations oob = [x for x in data if x not in boot]

We can tie all of this together with our small dataset used in the worked example of the prior section.

# scikit-learn bootstrap
from sklearn.utils import resample
# data sample
data = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
# prepare bootstrap sample
boot = resample(data, replace=True, n_samples=4, random_state=1)
print('Bootstrap Sample: %s' % boot)
# out of bag observations
oob = [x for x in data if x not in boot]
print('OOB Sample: %s' % oob)

# scikit-learn bootstrap

from sklearn.utils import resample

# data sample

data = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]

# prepare bootstrap sample

boot = resample(data, replace=True, n_samples=4, random_state=1)

print('Bootstrap Sample: %s' % boot)

# out of bag observations

oob = [x for x in data if x not in boot]

print('OOB Sample: %s' % oob)

Running the example prints the observations in the bootstrap sample and those observations in the out-of-bag sample

Bootstrap Sample: [0.6, 0.4, 0.5, 0.1]
OOB Sample: [0.2, 0.3]

1 2	Bootstrap Sample: [0.6, 0.4, 0.5, 0.1] OOB Sample: [0.2, 0.3]

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

List 3 summary statistics that you could estimate using the bootstrap method.
Find 3 research papers that use the bootstrap method to evaluate the performance of machine learning models.
Implement your own function to create a sample and an out-of-bag sample with the bootstrap method.

If you explore any of these extensions, I’d love to know.

Summary

In this tutorial, you discovered the bootstrap resampling method for estimating the skill of machine learning models on unseen data.

Specifically, you learned:

The bootstrap method involves iteratively resampling a dataset with replacement.
That when using the bootstrap you must choose the size of the sample and the number of repeats.
The scikit-learn provides a function that you can use to resample a dataset for the bootstrap method.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

106 Responses to A Gentle Introduction to the Bootstrap Method

Rafael May 25, 2018 at 10:34 am #

Great post, Jason! Helped me a lot

Reply
- Jason Brownlee May 25, 2018 at 2:52 pm #
  
  I’m glad to hear that.
  
  Reply
  - ww December 7, 2020 at 3:13 am #
    
    You forgot to add that you work on R or R studio, which makes it easier for you to know the program when you know the program, on foot, it’s a different ball, which the program will not show to a layman.
    
    Reply
    - Jason Brownlee December 7, 2020 at 6:19 am #
      
      I do not work for R or R studio and never have.
      
      Reply
    - Kenneth July 22, 2021 at 11:13 am #
      
      Describe how to use bootstrap to estimate mean square error
      
      Reply
      - Jason Brownlee July 23, 2021 at 5:44 am #
        
        Draw a sample, estimate your metric, repeat, average the scores.
Vladislav Gladkikh May 25, 2018 at 2:36 pm #

One more book:

Michael R. Chernick, Robert A. LaBudde. An Introduction to Bootstrap Methods with Applications to R (2011) https://www.amazon.com/Introduction-Bootstrap-Methods-Applications/dp/0470467045

Papers:

Yoram Reich, S.V.Barai. Evaluating machine learning models for engineering problems https://www.sciencedirect.com/science/article/pii/S0954181098000211

Gordon C. S. Smith, Shaun R. Seaman, Angela M. Wood, Patrick Royston, Ian R. White. Correcting for Optimistic Prediction in Small Data Sets https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108045/

Reply
- Jason Brownlee May 25, 2018 at 2:53 pm #
  
  Wonderful references, thanks Vladislav.
  
  Reply
Luis Ibarra May 26, 2018 at 1:34 am #

Thanks to this post i can finally understand the difference between K-Cross validation and Bootstrap, thanks for the clear explanation.

Reply
- Jason Brownlee May 26, 2018 at 6:00 am #
  
  I’m glad to hear that.
  
  Reply
Michał July 7, 2018 at 3:59 pm #

Hi Jason,

a very good post. Could you extend it with a bit of explanation/example on how to calculate confidence intervals at the end, e.g. for a bootstrap-calculated mean?

Reply
- Jason Brownlee July 8, 2018 at 6:16 am #
  
  See this post:
  https://machinelearningmastery.com/confidence-intervals-for-machine-learning/
  
  And this post:
  https://machinelearningmastery.com/calculate-bootstrap-confidence-intervals-machine-learning-results-python/
  
  Reply
Mahmood July 11, 2018 at 5:33 pm #

Thank you very much Jason, for a wonderful topic, It help me a lot to understand the concept.

Reply
- Jason Brownlee July 12, 2018 at 6:23 am #
  
  I’m glad to hear that Mahmood.
  
  Reply
jerry July 27, 2018 at 11:15 pm #

Thanks for this post I was expecting (going over ISLR’s bootstrap Labs) a bootstrap method in sklearn (or numpy, pandas). thanks for explanation. You may also want to mention the Panda’s resample method, useful for converting monthly to quarterly observations.

Not sure what the sklearn.cross-validation.bootstrap is doing.

Reply
- Jason Brownlee July 28, 2018 at 6:35 am #
  
  Thanks Jerry.
  
  Reply
gaurav srivastava August 7, 2018 at 8:24 am #

Hi Jason,

Thanks for the post. I understand what is Bootstrapping machine learning. I am confused between the difference between Bootstrapping and repeated random-subsampling cross-validation (https://en.wikipedia.org/wiki/Cross-validation_(statistics)#Repeated_random_sub-sampling_validation). To me both seem the same. First sample with randomly create a sub-sample from the given data and perform training of model on this. Next, validate the model on left out sample. Repeat the process some number of times. The final validation error would be an estimate from each of these iterations. Please let me know what is the difference?

One difference I can think of is bootstrapping samples with replacement and repeated random sub-sampling method does not repeat the sample. Is this the only difference?

Thanks,
Gaurav

Reply
- Jason Brownlee August 7, 2018 at 2:30 pm #
  
  Selection with replacement might be the main difference.
  
  Reply
  - Ernst Kloppenburg October 14, 2020 at 12:00 am #
    
    Jason, I feel in the approach shown in your post the concepts of Bootstrapping and repeated random-subsampling cross-validation are somehow intermingeled.
    
    I think one should follow the following approach, at least for an introduction:
    – first do the train-test split of the data and train the model (one model, not multiple ones)
    – then use the bootstrap for the model skill assessment. Here only the test data x, consisting of n points, will be used. We will sample with replacement n points from x to create bootstrap replicates x*, and assess the model skill on x*. We repeat this B times. The variation will show us how good our measurement of the model skill is, given the test data.
    
    One could possibly also do the “opposite”, i.e. train multiple models on bootstrap replicates of the training data, test each one with the same test data. This time the variation will show how strongly the model skill depends on the training data (similar to what we achieve with cross validation).
    
    Reply
    - Jason Brownlee October 14, 2020 at 6:20 am #
      
      Thanks for your suggestion.
      
      Reply
Alireza Hajian November 19, 2018 at 8:45 am #

Very useful Jason..it’s easy to understand the concepts..thx a lot

Reply
- Jason Brownlee November 19, 2018 at 2:19 pm #
  
  Thanks. I’m happy to hear that.
  
  Reply
Kingsley Udeh December 10, 2018 at 11:27 pm #

Hi Jason,

How could one apply bootstrap method to time series data?

Thanks

Reply
- Jason Brownlee December 11, 2018 at 7:44 am #
  
  Hmmm. Perhaps different amounts of history for the same model, or differently configured models?
  
  Reply
Kingsley Udeh December 11, 2018 at 4:51 pm #

Thanks for responding. The reason I’m considering bootstrap strategy is simply because I do not have sufficient data(time series) for fitting and validating my models. Thus, I need to find away to expand or augment my current data.

I came across moving block bootstrap method that simply segments the original data in form of blocks, which are resampled individually with replacement, while maintaining the order in the sequence across observations. I was able to increase my data with certain level of confidence interval, but the date index was missing in the bootstrapped data, leaving only a difficult index.

I would appreciate if you could link me to a more concise concept of the time series bootstrap as the article I consulted assumed a certain level of Statistics literacy.

Reply
- Jason Brownlee December 12, 2018 at 5:49 am #
  
  Intersting. I’m not familiar with this approach. Let me know how you go with it.
  
  Reply
  - Ernst Kloppenburg October 13, 2020 at 11:32 pm #
    
    The book by Efron and Tibshirani, “An Introduction to the Bootstrap” does cover the example of resampling a time series. Basically they use some model and then do a resampling of the model errors to create new time series. Have a look in the book – the book is worth studying.
    
    Reply
    - Jason Brownlee October 14, 2020 at 6:19 am #
      
      Thanks!
      
      Reply
Kingsley Udeh December 12, 2018 at 6:47 am #

Sure!

Reply
Connie L. Ekkens December 13, 2018 at 2:09 pm #

Is it possible to use bootstrapping with purposeful sampling?
Connie

Reply
- Jason Brownlee December 14, 2018 at 5:29 am #
  
  The idea is to use random sampling with replacement. If you use non-random sampling, you’ll be adding bias.
  
  Perhaps try it and compare results.
  
  Reply
lila January 14, 2019 at 7:46 am #

I would like to bootstrap my observations to estimate the NARDL model, could you help me please to create a program or simply guide me??

Reply
- Jason Brownlee January 14, 2019 at 11:15 am #
  
  What is the NARDL model?
  
  Reply
  - lila January 17, 2019 at 10:11 am #
    
    ARDL is an econometric model, that are two type : linear (ARDL) or non linear (NARDL).
    Auto regressive Distributed Lag Models (ARDL) model plays a vital role when comes a need to analyze a economic scenario. In an economy, change in any economic variables may bring change in another economic variables beyond the time. This change in a variable is not what reflects immediately, but it distributes over future periods. Not only macroeconomic variables, other variables such as loss or profit earned by a firm in a year can affect the brand image of an organization over the period.
    
    Reply
    - Jason Brownlee January 17, 2019 at 1:45 pm #
      
      Thanks for sharing, I have no experience with economic models/methods.
      
      Reply
      - lila January 22, 2019 at 11:46 pm #
        
        ok. thank you
hayleedee February 28, 2019 at 10:52 pm #

Hi Jason,
You mention ML model “skill”. I haven’t heard of this term before – is it the same as accuracy? Thanks for the great article, as always.

Reply
- Jason Brownlee March 1, 2019 at 6:20 am #
  
  Skill is the aspect of the models performance that we/stakeholders care about. It might be accuracy or error.
  
  Reply
Jack March 16, 2019 at 2:40 am #

The overall idea is really easy to understand, but I don’t quite get the statement “it is common to use a sample size that is the same as the original dataset”. How does that work? There’s no sampling going on if the sample is the same size as the original dataset. Unless “original dataset” means something different than I think it does here. It sounds like this is saying if you have 20 examples in your training set, your sample size should be 20.

Reply
- Jack March 16, 2019 at 2:43 am #
  
  Oh right, because of replacement. Still, I seems like using a smaller subset would be more useful intuitively.
  
  Reply
  - Jason Brownlee March 16, 2019 at 7:58 am #
    
    Correct!
    
    Why would a smaller dataset be more intuitive?
    
    Reply
- Jason Brownlee March 16, 2019 at 7:58 am #
  
  We are creating samples from the original sample that are the same size as the original sample, but may repeat some examples (e.g. selection with replacement).
  
  Does that help?
  
  Reply
KK March 22, 2019 at 5:30 am #

Thanks for the post, it really helped me a lot in understanding bootstrapping method.
I am stuck in a problem where I thought I could make use of bootstrap, after understanding the method it doesn’t seem reasonable. Could you please help me with that.
I am doing an image classification algorithms with multiple classes, the dataset is totally imbalanced. For example : class A has 2000 images and Class B has only 100 images. Could you please guide me how could I tackle this, and build a good CNN model?

Reply
- Jason Brownlee March 22, 2019 at 8:41 am #
  
  Good question.
  
  I have advice on handling imbalanced data here:
  https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
  
  Perhaps you can use augmentation to oversample the minority class:
  https://machinelearningmastery.com/image-augmentation-deep-learning-keras/
  
  A good place to start with CNNs is transfer learning:
  https://machinelearningmastery.com/transfer-learning-for-deep-learning/
  
  Reply
KK March 22, 2019 at 2:29 pm #

Thank you so much for the inputs. I’ll go through the methods you suggested.
Adding comments:
So bootstrap method can’t be used to balance the dataset, correct.?
I am currently using transfer learning (vgg16, resnet50) to classify my images. However as the data is largely imbalanced I ain’t able to get the expected results.

Thank you,
KK

Reply
- Jason Brownlee March 22, 2019 at 2:39 pm #
  
  Bootstrap is not intended to balance a dataset. Perhaps it can be used for that, I have not seen this use case.
  
  Reply
  - KK March 22, 2019 at 2:48 pm #
    
    Thanks for the clarification. Now I have better understanding on these methods. I will do oversampling on the minority dataset (flip, add some noise etc) and retrain my transfer learning model again. Will update here if I get interesting outcomes.
    Thanks for the blog again, it is helping me to understand most of the topics.
    
    Reply
    - Jason Brownlee March 23, 2019 at 9:15 am #
      
      Keen to hear how you go.
      
      Reply
Freddie April 4, 2019 at 2:40 am #

Hi Jason,
Thank you so much for the post, very helpful!
I have a question about the bootstrapping sample size. Other online resources suggest for statistical inferences the bootstrap sample size should be equal to the original sample size: “The accuracy of statistical estimates depends on the sample size”.
Do you see any risk of taking only one as the bootstrapping sample size?

Reply
- Jason Brownlee April 4, 2019 at 7:58 am #
  
  A sample size of 1 is too small, at least 30 would be required I would expect.
  
  Reply
Koffi Mawuna Koudjonou May 24, 2019 at 5:47 pm #

I appreciate your post!

My question is that I we use bootstrap sampling that way, we will lose the time dependency of our dataset I guess. How is is it useful for machine learning time series predictions?

Thanks.

Reply
- Jason Brownlee May 25, 2019 at 7:44 am #
  
  You could fit the model on different subsets of history.
  
  Reply
mwh July 26, 2019 at 6:54 am #

Thanks Jason,, i have two questions please,, 1- if my data set is 4D, where each data point (row) consists of four attributes,, do i bootstrap the whole data points or i can mix between the attributes? 2- if my data set is large e.g., 300k, can i resample a subset e.g., 5000, each time? for my case i need to do the sampling 1000 times, where each time i need only 5000.. i got memory error when trying to resample 300k. Thanks

Reply
- Jason Brownlee July 26, 2019 at 8:37 am #
  
  You select across rows (samples) not columns (features).
  
  Reply
Cicely August 14, 2019 at 7:41 pm #

Hi Jason,

Newbie question, I’m afraid. If I take 10,000 random samples from a Normal(mu_0, sigma_0), then calculate mean and sd for the 10k samples, I have mu_1 and sigma_1, i.e. slightly different from mu_0 and sigma_0, as to be expected. (Please excuse notation)

Using the bootstrap, I expected the estimates returned to approach mu_0 and sigma_0, i.e. those of the population. But they don’t: they approach mu_1 and sigma_1, those of the 10k sample.

Have I misunderstood the application of the bootstrap method?

Reply
- Jason Brownlee August 15, 2019 at 8:04 am #
  
  We don’t have new samples, so we are not doing law of large numbers and better approximating the population parameters.
  
  Not sure your test/comparison is reasonable. Perhaps choosing a distribution parameter is inappropriate for the demo as it’s confusing, perhaps a nonlinear function of the samples would make the example clearer?
  
  Reply
  - Cicely August 15, 2019 at 8:12 am #
    
    Ah, I think I see. Thank you ever so much for replying so quickly, I’m very grateful.
    
    Reply
    - Jason Brownlee August 15, 2019 at 8:29 am #
      
      I’m glad it helped.
      
      Reply
Ali October 9, 2019 at 6:42 am #

Hi Jason,

Sorry this is long, but I would really appreciate the help!
This post was helpful as I am trying to “increase” my sample size so that I may improve my model estimations. I am working in R for this project, but I’m familiar with Python as well. I think a previous comment was attempting to address this concern, but it wasn’t clear.

I have data for wildlife detections and the environmental characteristics of the survey sites where the detections occurred. I want wildlife detections from more survey sites so that the occupancy model I am using can provide more accurate estimates of species richness. I only have 40 survey sites with wildlife detections and I’d like to have 80. I want to make sure that my detections are resampled with the other dependent variables in mind when bootstrapping so that the detections are relatively consistent with environmental characteristics.

Is there a way to make sure the feature I want to resample from a dataset is resampled with other dependent variables considered? Or, a way to make “new survey” sites with new detections based on the data from the 40 survey sites I already have (bootstrap multiple features at a time)?

Thank you for any help or references!

Reply
- Jason Brownlee October 9, 2019 at 8:19 am #
  
  Yes, you could tie all variables together into a data structure in memory, then resample the collection of aggregate “records”.
  
  Reply
  - Ali October 10, 2019 at 6:39 am #
    
    Thanks for the reply!
    What exactly do you mean by that? Do you mean I should save the variables in a data frame then resample rows from the data frame?
    
    Thanks!
    Ali
    
    Reply
    - Jason Brownlee October 10, 2019 at 7:06 am #
      
      Or whatever structure you’re comfortable working with.
      
      Reply
iram shahzadi October 20, 2019 at 11:15 pm #

It really helped me a lot. Thanks ????

Reply
- Jason Brownlee October 21, 2019 at 6:18 am #
  
  I’m happy to hear that.
  
  Reply
Penryr October 21, 2019 at 5:45 pm #

You should explain why “with replacement” is important and what it achieves

Reply
- Jason Brownlee October 22, 2019 at 5:45 am #
  
  Great suggestion, thanks.
  
  Reply
Chris December 4, 2019 at 7:24 am #

From a given sample of 400, I had sub samples divided into two categorical variable l, let us say: A and B. A=106 samples and B=294. Since there is a great imbalanced between the two number of samples, will boot strapping help in doing a correlation for the categorical variable A? I am only interested in doing a correlation test with categorical variable A?

Reply
- Jason Brownlee December 4, 2019 at 1:54 pm #
  
  Hmmm, small sample sizes.
  
  Using many thousands of repeats might help you estimate the desired quantity. I feel nervous though.
  
  Using a stratified resampling procedure might help.
  
  Reply
Lydia December 5, 2019 at 11:10 pm #

Thanks for the post. With confidence intervals calculated by multiplying the t value by the standard error of the mean, there is a clear relationship between sample size and width of confidence interval (quadrupling sample size halves width). I was wondering what the link is between sample size and the width of a confidence interval calculated by bootstrapping?

Reply
- Jason Brownlee December 6, 2019 at 5:18 am #
  
  Perhaps related to the law of large numbers when estimating a quantity:
  https://machinelearningmastery.com/a-gentle-introduction-to-the-law-of-large-numbers-in-machine-learning/
  
  Reply
Junaidda December 12, 2019 at 6:52 pm #

Am I correct that the bootstrap is only required when the sample of the study is small? Then it needed the repetition through the bootstrap to increase the sample.

Reply
- Jason Brownlee December 13, 2019 at 5:57 am #
  
  Not only, you can use the method generally to estimate a quantity, such as model accuracy when presenting a final model.
  
  Reply
ömer emhan January 4, 2020 at 8:11 am #

Hi Mr. Brownlee;
in this tutorial the resample function choosing the same values in boot and oob samples for all trials.
I mean that at every cycle
Bootstrap Sample: [0.6, 0.4, 0.5, 0.1]
OOB Sample: [0.2, 0.3]
I need to divide the dataset 10 portions by resample function and wanna get diffirent sets for train and test. Then each train and test set will be used in ML algorithm i.e. I need a bootstrap aggregation. But the output of resample function is the same for each loop.

Reply
- Jason Brownlee January 4, 2020 at 8:42 am #
  
  Yes, after a sample is selected, you must use it to create the train/test sets manually.
  
  You can see an example here:
  https://machinelearningmastery.com/calculate-bootstrap-confidence-intervals-machine-learning-results-python/
  
  Reply
Sikder Tahsin Al-Amin February 11, 2020 at 8:16 am #

Really the example is very helpful. I had a hard time to understand the concept just by reading several links. Then the example of this link cleared my idea. Thank you.

Reply
- Jason Brownlee February 11, 2020 at 1:42 pm #
  
  Thanks, I’m happy it helped!
  
  Reply
Mounir February 25, 2020 at 2:05 am #

Perfectly explained ! Thanks a lot !

Reply
- Jason Brownlee February 25, 2020 at 7:49 am #
  
  Thanks, I’m happy it was useful.
  
  Reply
Gaby August 6, 2020 at 6:40 am #

Hi!!!
After having 500k and they are´t within CI95%, what does it mean? that I should analyze more k (maybe 1000)? or the proposed model does not work?

Thanks!

Reply
Annalysa K Lovos October 9, 2020 at 2:51 pm #

Hello,
I’m wondering if you can point to any examples of how to write up bootstrapped statistics – reporting on a small pilot study in my case.

Thanks!

Reply
- Jason Brownlee October 10, 2020 at 6:58 am #
  
  This might give you ideas:
  https://machinelearningmastery.com/calculate-bootstrap-confidence-intervals-machine-learning-results-python/
  
  Reply
Dr. Sane October 29, 2020 at 2:28 am #

Thank you very much for this awesome explanation.

I want to repeat the bootstrap X times, how can implement that in your code?

Best Regards

Reply
- Jason Brownlee October 29, 2020 at 8:04 am #
  
  Why?
  
  Perhaps put another loop around the bootstrap.
  
  Reply
  - Dr. Sane October 30, 2020 at 12:50 am #
    
    In other words: or the output of the Bootstrap should generate X resample packages from the original data. In your Example i only see 1 resample package that was created. How to create multiple?
    
    Thanks.
    
    Reply
    - Jason Brownlee October 30, 2020 at 6:53 am #
      
      See this worked example:
      https://machinelearningmastery.com/calculate-bootstrap-confidence-intervals-machine-learning-results-python/
      
      Reply
Mico Reinier October 29, 2020 at 2:58 am #

Hola,

How you can put in the certain repetiton number? Cant find it on your page. Sincerly, Mico

Reply
- Jason Brownlee October 29, 2020 at 8:05 am #
  
  Here is a worked example:
  https://machinelearningmastery.com/calculate-bootstrap-confidence-intervals-machine-learning-results-python/
  
  Reply
Rudi Greg October 29, 2020 at 3:20 am #

Nice Explanation!

Reply
- Jason Brownlee October 29, 2020 at 8:05 am #
  
  Thanks!
  
  Reply
Syed Khurram Mahmud December 29, 2020 at 10:09 am #

Hi there! Thankyou very much for the post and great work. I am an avid follower. I am sorry if this has been answered above or I am missing something really basic. Can we estimate variance with Bootstrap method?
I have a distribution with mean 0.08 and variance 0.0001.
The mean is estimated correctly but I want to estimate the variance too.

Thankyou very much.

Reply
- Jason Brownlee December 29, 2020 at 1:31 pm #
  
  You’re welcome.
  
  Good question, off the cuff I would guess yes you can, but I would caution you to check the literature to confirm.
  
  Reply
Syed Khurram Mahmud December 29, 2020 at 6:48 pm #

Thankyou for the reply and your reply encouraged me to dig further and I found out what mistake I committed. I used np.var(boot) function and it gave me the variance correctly after 900 samples. What i have mentioned above 0.0001 is the standard deviation, so wat ever I was getting was the variance actually. So yes we can get the variance for a distribution but the samples have to be lot lot more.

mu_1, sigma_1 =0.08,0.0001 //the distribution to be predicted

for i in range(900):

INT_A = np.random.normal(mu_1,sigma_1,1)
INT.append(INT_A)

# prepare bootstrap sample
boot = resample(INT, replace=True, n_samples=len(INT), random_state=1)
print(‘Bootstrap Sample: %s’ % boot)
# out of bag observations
oob = [x for x in INT if x not in boot]
print(‘OOB Sample: %s’ % oob)

print(‘MEAN:’)
print(np.mean(boot))
print(‘Variance’)
print(np.var(boot))

Please correct me if I am wrong OR I am taking lot more samples than required.

Thankyou in advance.

Reply
- Jason Brownlee December 30, 2020 at 6:35 am #
  
  You’re welcome.
  
  Thanks for sharing. Sorry, I don’t have the capacity to review code.
  
  Reply
Nours December 31, 2020 at 4:51 pm #

I need to understand if resampling function do the first step in bootstrap?

Reply
- Jason Brownlee January 1, 2021 at 5:23 am #
  
  Sorry, i don’t understand your question. Can you please elaborate?
  
  Reply
Nitin Pasumarthy January 6, 2021 at 5:09 pm #

Thanks again for such a concise and easy to understand post Jason. Couple of questions,

1. How to complete CI if the chosen statistic does not form a Gaussian distribution?
2. To address, “Unfortunately, the resample() API does not include any mechanism to easily gather the out-of-bag observations that could be used as a test set to evaluate a fit model.”, why can’t we set aside some data and not use it in any samples?

Reply
- Jason Brownlee January 7, 2021 at 6:16 am #
  
  You’re welcome.
  
  The bootstrap an be used directly for a non-gaussian distribution.
  
  Bootstrap requires drawing samples from the dataset, some examples will be in sample and some will be out sample.
  
  Reply
Mansik March 8, 2021 at 12:24 am #

Hello professor.

I am a graduate student in the us, and I have struggled with an issue, boostrap when the sample size is big enough.

Specifically, I am doing LPA that also required boostrapping.
When people have sample size of 400 in their data, they typically resampe (boostrap) 10,000 times or at least 1,000 times.

My sample is large enough, n=45,000. In this case, how my time do I need to bootstrap?
As there are may questions about minimum sample size, but my case looks unusual.

Reply
- Jason Brownlee March 8, 2021 at 4:54 am #
  
  I think 1,000 times would be overkill, 30 or 100 might be sufficient. But it probably depends on the density/complexity of the dataset.
  
  Reply
Fatemeh Nikkhoo January 16, 2023 at 3:20 am #

You have clearly discussed the subject! Thanks a lot!

Reply
- James Carmichael January 16, 2023 at 8:20 am #
  
  You are very welcome Fatemeh! We appreciate the support and feedback!
  
  Reply
Dave January 31, 2023 at 2:06 pm #

Jason, thank you for such a great explanation.

I have being thinking of this; can we split the data set into train set and test set and then perform bootstrap on each of them independently?
So at every time, we can use a set of bootstrap from the train set and it’s corresponding test set to train and evaluate the model. What do you think of this?
In this instance, can we say the test set is same as the out-of-bag sample?

Reply
- James Carmichael February 1, 2023 at 9:45 am #
  
  Hi Dave…That approach is reasonable. Please proceed with it and let us know your findings.
  
  Reply
Søren Fyhn August 14, 2023 at 1:59 am #

Great post. One question to the Bootstrap method. When selecting with replacement, and using recommended sample size to be the same as the original dataset, wouldn’t there be a risk that the remaining observations will not be enough for a meaningful test dataset? If by chance many unique values from the test are selected and only a few remains, what would you do in such case?

Reply
- Søren Fyhn August 14, 2023 at 2:03 am #
  
  Edit to my post above, last sentence should have been:
  
  “If by chance many unique values are selected and only a few remains, what would you do in such case?”
  
  Reply
Juan February 8, 2024 at 5:29 am #

Hello.

How do you apply bootstrap to better estimate standard deviation and its confidence interval?

How do you apply it when you have multiple variables, for example in a regression model?

Reply

Navigation

A Gentle Introduction to the Bootstrap Method

Tutorial Overview

Need help with Statistics for Machine Learning?

Bootstrap Method

Configuration of the Bootstrap

Sample Size

Repetitions

Worked Example

Bootstrap API

Extensions

Further Reading

Posts

Books

API

Articles

Summary

Get a Handle on Statistics for Machine Learning!

Develop a working understanding of statistics

Discover how to Transform Data into Knowledge

More On This Topic

106 Responses to A Gentle Introduction to the Bootstrap Method

Leave a Reply Click here to cancel reply.