Difference Between a Batch and an Epoch in a Neural Network

By Jason Brownlee on August 15, 2022 in Deep Learning 223

Stochastic gradient descent is a learning algorithm that has a number of hyperparameters.

Two hyperparameters that often confuse beginners are the batch size and number of epochs. They are both integer values and seem to do the same thing.

In this post, you will discover the difference between batches and epochs in stochastic gradient descent.

After reading this post, you will know:

Stochastic gradient descent is an iterative learning algorithm that uses a training dataset to update a model.
The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

What is the Difference Between a Batch and an Epoch in a Neural Network?
Photo by Graham Cook, some rights reserved.

Overview

This post is divided into five parts; they are:

Stochastic Gradient Descent
What Is a Sample?
What Is a Batch?
What Is an Epoch?
What Is the Difference Between Batch and Epoch?

Stochastic Gradient Descent

Stochastic Gradient Descent, or SGD for short, is an optimization algorithm used to train machine learning algorithms, most notably artificial neural networks used in deep learning.

The job of the algorithm is to find a set of internal model parameters that perform well against some performance measure such as logarithmic loss or mean squared error.

Optimization is a type of searching process and you can think of this search as learning. The optimization algorithm is called “gradient descent“, where “gradient” refers to the calculation of an error gradient or slope of error and “descent” refers to the moving down along that slope towards some minimum level of error.

The algorithm is iterative. This means that the search process occurs over multiple discrete steps, each step hopefully slightly improving the model parameters.

Each step involves using the model with the current set of internal parameters to make predictions on some samples, comparing the predictions to the real expected outcomes, calculating the error, and using the error to update the internal model parameters.

This update procedure is different for different algorithms, but in the case of artificial neural networks, the backpropagation update algorithm is used.

Before we dive into batches and epochs, let’s take a look at what we mean by sample.

Learn more about gradient descent here:

Gradient Descent For Machine Learning

What Is a Sample?

A sample is a single row of data.

It contains inputs that are fed into the algorithm and an output that is used to compare to the prediction and calculate an error.

A training dataset is comprised of many rows of data, e.g. many samples. A sample may also be called an instance, an observation, an input vector, or a feature vector.

Now that we know what a sample is, let’s define a batch.

What Is a Batch?

The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.

Think of a batch as a for-loop iterating over one or more samples and making predictions. At the end of the batch, the predictions are compared to the expected output variables and an error is calculated. From this error, the update algorithm is used to improve the model, e.g. move down along the error gradient.

A training dataset can be divided into one or more batches.

When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.

Batch Gradient Descent. Batch Size = Size of Training Set
Stochastic Gradient Descent. Batch Size = 1
Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set

In the case of mini-batch gradient descent, popular batch sizes include 32, 64, and 128 samples. You may see these values used in models in the literature and in tutorials.

What if the dataset does not divide evenly by the batch size?

This can and does happen often when training a model. It simply means that the final batch has fewer samples than the other batches.

Alternately, you can remove some samples from the dataset or change the batch size such that the number of samples in the dataset does divide evenly by the batch size.

For more on the differences between these variations of gradient descent, see the post:

A Gentle Introduction to Mini-Batch Gradient Descent and How to Configure Batch Size

For more on the effect of batch size on the learning process, see the post:

How to Control the Speed and Stability of Training Neural Networks Batch Size

A batch involves an update to the model using samples; next, let’s look at an epoch.

What Is an Epoch?

The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset.

One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. For example, as above, an epoch that has one batch is called the batch gradient descent learning algorithm.

You can think of a for-loop over the number of epochs where each loop proceeds over the training dataset. Within this for-loop is another nested for-loop that iterates over each batch of samples, where one batch has the specified “batch size” number of samples.

The number of epochs is traditionally large, often hundreds or thousands, allowing the learning algorithm to run until the error from the model has been sufficiently minimized. You may see examples of the number of epochs in the literature and in tutorials set to 10, 100, 500, 1000, and larger.

It is common to create line plots that show epochs along the x-axis as time and the error or skill of the model on the y-axis. These plots are sometimes called learning curves. These plots can help to diagnose whether the model has over learned, under learned, or is suitably fit to the training dataset.

For more on diagnostics via learning curves with LSTM networks, see the post:

A Gentle Introduction to Learning Curves for Diagnosing Model Performance

In case it is still not clear, let’s look at the differences between batches and epochs.

What Is the Difference Between Batch and Epoch?

The batch size is a number of samples processed before the model is updated.

The number of epochs is the number of complete passes through the training dataset.

The size of a batch must be more than or equal to one and less than or equal to the number of samples in the training dataset.

The number of epochs can be set to an integer value between one and infinity. You can run the algorithm for as long as you like and even stop it using other criteria besides a fixed number of epochs, such as a change (or lack of change) in model error over time.

They are both integer values and they are both hyperparameters for the learning algorithm, e.g. parameters for the learning process, not internal model parameters found by the learning process.

You must specify the batch size and number of epochs for a learning algorithm.

There are no magic rules for how to configure these parameters. You must try different values and see what works best for your problem.

Worked Example

Finally, let’s make this concrete with a small example.

Assume you have a dataset with 200 samples (rows of data) and you choose a batch size of 5 and 1,000 epochs.

This means that the dataset will be divided into 40 batches, each with five samples. The model weights will be updated after each batch of five samples.

This also means that one epoch will involve 40 batches or 40 updates to the model.

With 1,000 epochs, the model will be exposed to or pass through the whole dataset 1,000 times. That is a total of 40,000 batches during the entire training process.

Summary

In this post, you discovered the difference between batches and epochs in stochastic gradient descent.

Specifically, you learned:

Stochastic gradient descent is an iterative learning algorithm that uses a training dataset to update a model.
The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

223 Responses to Difference Between a Batch and an Epoch in a Neural Network

Zakarie July 20, 2018 at 5:54 am #

Very informative and well explained.

Reply
- Jason Brownlee July 20, 2018 at 6:01 am #
  
  Thanks.
  
  Reply
- JAIKISHAN December 29, 2020 at 1:48 pm #
  
  Thank you Jason for the explanation with absolute clarity.
  
  Reply
  - Jason Brownlee December 30, 2020 at 6:30 am #
    
    You’re welcome.
    
    Reply
Sameer July 20, 2018 at 6:34 am #

Great Explanation Jason..I have been your big fan and have read all of your books..it’s great to learn from u.

Reply
- Jason Brownlee July 21, 2018 at 6:25 am #
  
  Thanks.
  
  Reply
  - Eucharia Onyekwere May 18, 2021 at 9:59 pm #
    
    Thanks Jason. Great explanations, your works are quite resourceful.
    
    Reply
    - Jason Brownlee May 19, 2021 at 6:34 am #
      
      Thanks!
      
      Reply
- Elia July 10, 2019 at 8:04 am #
  
  l love your articles ,good explanation and i enjoy from the reading
  
  Reply
  - Jason Brownlee July 10, 2019 at 8:22 am #
    
    Thanks.
    
    Reply
    - Jesse April 25, 2021 at 7:22 am #
      
      ????
      
      Reply
Mark Littlewood July 20, 2018 at 7:23 am #

You nailed it with the last paragraph, a small simple toy example always trumps a description

Reply
- Jason Brownlee July 21, 2018 at 6:26 am #
  
  Thanks Mark.
  
  Reply
  - med July 24, 2019 at 2:45 pm #
    
    does the content of batches change frome an epoch to another ?
    
    Reply
    - Jason Brownlee July 25, 2019 at 7:38 am #
      
      Yes. The samples are shuffled at the end of each epoch and batches across epochs differ in terms of the samples they contain.
      
      Reply
      - Craig March 24, 2020 at 7:35 am #
        
        If one is making a time series forecasting model (say something with an lstm layer) will the batch observations of the training set be kept in “chunks” (meaning groups of time will not be broken up, and thus the underlying pattern disrupted)? This matters, right?
        
        Your articles are great! thank you!
      - Jason Brownlee March 24, 2020 at 8:00 am #
        
        It may. It often does not matter.
        
        You can evaluate this by shuffling samples vs not shuffling samples fed into the LSTM during training or inference.
      - djm July 20, 2020 at 8:11 pm #
        
        Thank you very much for your precise explanation. If all samples are shuffled at the end of each epoch, is it possible that we may find a single sample in the datasets to be evaluated so many times and some might not be evaluated at all? Or is it possible to make once evaluated sample not to be evaluated again?
      - Jason Brownlee July 21, 2020 at 6:01 am #
        
        No. Each sample gets one opportunity to be used to update the model each epoch.
Obaid Ashraf July 20, 2018 at 3:42 pm #

Good explanation and good example .. Thankyou and keep up the good work sir !!

Reply
- Jason Brownlee July 21, 2018 at 6:29 am #
  
  Thanks.
  
  Reply
Durgesh Kumar Trivedi July 20, 2018 at 5:25 pm #

Very well explained and most simple possible way !!!.

Reply
- Jason Brownlee July 21, 2018 at 6:31 am #
  
  I’m glad it helped.
  
  Reply
Vijay July 20, 2018 at 5:46 pm #

Absolutely, thanks for making this Dr. Jason – this eases life without hammering head and time for some on exploring several sources

I have quick question based on (below excerpt from your post)…could you please name / refer other procedures used to update parameters in the case of other algorithms.
*******************************************************************************************
Each step involves using the model with the current set of internal parameters to make predictions on some samples, comparing the predictions to the real expected outcomes, calculating the error, and using the error to update the internal model parameters.

This update procedure is different for different algorithms, but in the case of artificial neural networks, the backpropagation update algorithm is used.
*******************************************************************************************

Reply
- Jason Brownlee July 21, 2018 at 6:31 am #
  
  Glad it helped.
  
  You can learn more about other algorithms here:
  https://machinelearningmastery.com/start-here/#algorithms
  
  Reply
Pospai July 20, 2018 at 7:53 pm #

Très bien. Mais j’aurais aimé voir plus d’exemples. En tout cas GRAND MERCI !

Reply
- Jason Brownlee July 21, 2018 at 6:33 am #
  
  Thanks. Did the example at the end help?
  
  Reply
  - Subra June 26, 2023 at 1:49 pm #
    
    Is the global minimum found for each epoch during backprop?
    
    Reply
Emrah July 21, 2018 at 4:23 am #

in modern deeeep learning approaches, i almost always encounter that people save their models after some number of epoches (or some time period) while visualizing some kind of performance metrics to evaluate the next values for the hyperparams, thereafter do they carry out their experiments for the next epochs. So we can call this procedure as ‘mini epoch stochactic deep learning’. Thanks.

Reply
- Jason Brownlee July 21, 2018 at 6:39 am #
  
  Thanks for sharing.
  
  Reply
Jerry Dowetin July 22, 2018 at 8:46 pm #

This is brilliant and straight forward. Thanks for the mini course Dr. Brownlee

Reply
- Jason Brownlee July 23, 2018 at 6:08 am #
  
  I’m glad it helped.
  
  Reply
Desi Mazdur August 1, 2018 at 5:58 pm #

Hello Dr Jason,

Thank you again for a great blog post. For time series data in LSTM, does it ever makes sense to have the size of a batch more than one?
I have searched and searched and I could not find any example where the batch size is more than one but I have also not found anyone saying that it does not make sense.

Reply
- Jason Brownlee August 2, 2018 at 5:59 am #
  
  Yes, when you want the model to learn across multiple subsequences.
  
  I have some posts that demonstrate this scheduled.
  
  Reply
youssef August 30, 2018 at 10:47 am #

thank you for your explanation really very cair thanks again

Reply
- Jason Brownlee August 30, 2018 at 4:50 pm #
  
  I’m happy that it helped.
  
  Reply
Strange Xue October 30, 2018 at 12:26 am #

I’ve read many blogs written by you about such things. It does help me a lot, thank you! 抱拳

Reply
- Jason Brownlee October 30, 2018 at 6:02 am #
  
  Thanks, I’m glad to hear that the post helped.
  
  Reply
Zsolt November 3, 2018 at 10:07 am #

Thanks, great explanation. So far your blog is the best source for learning ML I’ve found (for beginners like me).

Reply
- Jason Brownlee November 4, 2018 at 6:24 am #
  
  Thanks!
  
  Reply
Jenna Ma November 10, 2018 at 10:47 pm #

It is very clear. Thank you.
I also see ‘steps_per_epoch’ in some cases, what is that mean? Is it same as batches?

Reply
- Jason Brownlee November 11, 2018 at 6:05 am #
  
  The number of batches to retrieve from a generator in order define an epoch.
  
  Reply
Francisco November 22, 2018 at 9:09 pm #

We love examples! Thank you so much!

Reply
- Jason Brownlee November 23, 2018 at 7:48 am #
  
  Thanks.
  
  Reply
Mohammad November 26, 2018 at 10:19 am #

Thank you so much for your crystal clear explanation

Reply
- Jason Brownlee November 26, 2018 at 2:01 pm #
  
  I’m happy you found it useful.
  
  Reply
Fatma December 6, 2018 at 5:53 pm #

Great explanation in an easy way. Thanks.

Reply
- Jason Brownlee December 7, 2018 at 5:18 am #
  
  Thanks, I’m glad it helped.
  
  Reply
sangeeth January 21, 2019 at 1:48 pm #

Hi,
Updates are performed after each batches are over. I just used one sample and gave different batch_sizes in model.fit, why does the value change every time?…it should be able to take one batch size if there is only one sample, isn’t it?

Reply
- Jason Brownlee January 22, 2019 at 6:18 am #
  
  Sorry, I don’t understand your question, can you elaborate please?
  
  Reply
Alaa February 8, 2019 at 5:25 am #

What a great explanation!
Never sent a reply to a tutorial, but cannot leave without saying Thanks Jason.
God bless you!

Reply
- Jason Brownlee February 8, 2019 at 8:05 am #
  
  Thanks, I’m glad it helped!
  
  Reply
ANita February 26, 2019 at 9:16 pm #

Fantastic explanation !!!

Reply
- Jason Brownlee February 27, 2019 at 7:26 am #
  
  Thanks.
  
  Reply
sampath April 1, 2019 at 5:56 am #

Well Explained Thanks!!

Reply
- Jason Brownlee April 1, 2019 at 7:53 am #
  
  Thanks, I’m glad it helped.
  
  Reply
Momo April 12, 2019 at 7:34 pm #

Hello there,
I am currently working with Word2Vec. In connection with Epochs and batchSize I still don’t understand exactly what a sample is. Above you describe that a sample is a single row of data. In my program I first edit my text file with a SentenceIterator so that I get one sentence per line and then I use a tokenizer to get single words in these lines. Is a sample in Word2Vec a word from the data set or is it a line (containing a sentence)? Thank you very much in Advance 🙂

Reply
- Jason Brownlee April 13, 2019 at 6:27 am #
  
  The samples/epoch/batch terminology does not map onto word2vec. Instead you just have a training dataset of text from which you learn statistics.
  
  Reply
Momo April 16, 2019 at 1:12 am #

But with the program Word2Vec you also have the hyperparameters Epochs, Iterations and Batch Size, which you can set… Don’t you think that they also influence the results from Word2Vec.
As I understood it now, a set passed as a Batch contains one sentence. However, I’m surprised that the number of iterations doesn’t change if I vary the number of epochs and batch sizes but don’t define iterations concretely. Do you know how that works?

Reply
- Jason Brownlee April 16, 2019 at 6:51 am #
  
  Not really, I recommend this tutorial:
  https://machinelearningmastery.com/develop-word-embeddings-python-gensim/
  
  Reply
Sagara Sumathipala May 2, 2019 at 2:24 am #

Very well explained in simple terms. Thanks.

Reply
- Jason Brownlee May 2, 2019 at 8:06 am #
  
  Thanks.
  
  Reply
Kaligambe Abraham May 10, 2019 at 2:27 pm #

It’s finally clear. Thank you

Reply
- Jason Brownlee May 11, 2019 at 6:04 am #
  
  I’m happy to hear that.
  
  Reply
Suraj Bhagat May 12, 2019 at 12:59 am #

You are super Dr,
Thank you so much for writing in easy way to understand…. Also, try to add pics or graph or schematic diagram to represent your text. As I have seen here you gave one example, it makes many things with super clarity. In some previous post you added graph as well…

Thanks again
Please keep continue

Best regards
Suraj

Reply
- Jason Brownlee May 12, 2019 at 6:44 am #
  
  Thanks for the suggestion!
  
  Reply
Karan May 14, 2019 at 6:37 pm #

Hi Jason. After every epoch, the accuracy either improves or sometimes not. For example, epoch 1 achieved accuracy of 94 and epoch 2 achieved an accuracy of 95. After the end of epoch 1 we get new weights (i.e updated after final epoch 1 batch). Is that the new weights used in epoch 2 begining to improve it from 94% to 95%? If yes, is that the reason for some epoch getting lower accuracy from the previous epoch due to the generalization of weights for the entire dataset? That’s why we get good accuracy after running so many epochs due to better generalization?

Reply
- Jason Brownlee May 15, 2019 at 8:12 am #
  
  Typically more training means better accuracy, but not always.
  
  Sometimes it can be a good idea to stop training early, see this post on the topic:
  https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/
  
  Reply
Zakariya May 14, 2019 at 9:58 pm #

Thanks! That was simple and easy to understand.

Reply
- Jason Brownlee May 15, 2019 at 8:15 am #
  
  Thanks, I’m happy it helped!
  
  Reply
saad wazir May 27, 2019 at 6:02 am #

very well explained thankyou boy

Reply
- Jason Brownlee May 27, 2019 at 6:52 am #
  
  Thanks.
  
  Reply
bright stern May 28, 2019 at 5:54 am #

Well explained with easy to understand example. thank you

Reply
- Jason Brownlee May 28, 2019 at 8:21 am #
  
  Thanks, I’m glad it helped.
  
  Reply
Rohaifa June 18, 2019 at 9:00 pm #

Indeed, in the last example, the total number of mini-batches is 40,000, but this is true only if the batches are selected without shuffling the training data or selected with data shuffling but without repetition. Otherwise, if within one epoch the mini batches are constructed by selecting training data with repetition, we can have some points that appear more than once in one epoch (they appear in different mini batches in one epoch) and others only once. Therefore, the total number of mini-batches, in this case, may exceed 40,000.

Reply
- Jason Brownlee June 19, 2019 at 7:54 am #
  
  Typically data is shuffled prior to each epoch.
  
  Typically we do not select samples with replacement as it will bias the training.
  
  Reply
Hani Younis July 15, 2019 at 4:44 pm #

You Deserve Big Thank You letter for this explanation

Reply
- Jason Brownlee July 16, 2019 at 8:12 am #
  
  Thanks, I’m glad it helped.
  
  Reply
Pavan July 27, 2019 at 9:42 pm #

thanks for this amazing blog post 🙂

If i have 1000 training samples and my batchsize=400 then i have to remove 200 samples
from my training data , always my training data should be mulitple of the batchsize

Reply
- Jason Brownlee July 28, 2019 at 6:44 am #
  
  Thanks.
  
  No, the samples will be shuffled before each epoch, then you will get 3 batches, 300, 300 and 200.
  
  It is better to choose a batch size that divides the samples evenly, if possible, e.g. 100, 200, or 500 in your case.
  
  Reply
Yu Bai September 16, 2019 at 12:34 pm #

Thank you so much! Such a nice explanation with an intuitive example in the end! Thank you!

Reply
- Jason Brownlee September 16, 2019 at 2:13 pm #
  
  Thanks, I’m glad it helped.
  
  Reply
ahmedhesham33 October 2, 2019 at 12:10 am #

thanks for your great article , and i have a question
if i have the following settings and i am using fit_generator function
epochs =100
data=1000 images
batch = 10
step_per_epochs = 20
i know i should set the step_per_epochs = (1000/10)= 100 but if i set it to 20

Are these settings mean that the model will be trained using only part of the training data (at each epoch will use the same 200 images(batch*step_per_epochs )) and not used the all 1000 images ?
or it will use first 200 images in dataset in first epoch then the following 200 images in the second epoch and so on (will divided the 1000 images on each 5 epochs ) and model will be trained 20 times using the whole training dataset in the 100 epochs
Thanks

Reply
- Jason Brownlee October 2, 2019 at 8:00 am #
  
  Yes, only 200 images per epoch will be used.
  
  Reply
Carol October 2, 2019 at 5:13 am #

olá, tudo bem? Muito obrigada pela explicação. Gostaria de saber se o senhor sabe o que é Batch Accumulation, Random seed e Validation Interval (em epocas)

Reply
- Jason Brownlee October 2, 2019 at 8:06 am #
  
  Yes.
  
  Batch accumulation is the error collected from the samples in one match used to update the weights.
  
  Random seed is the starting point for the random number generator:
  https://machinelearningmastery.com/introduction-to-random-number-generators-for-machine-learning/
  
  What exactly do you mean by validation interval? What context? Perhaps you mean validation dataset:
  https://machinelearningmastery.com/difference-test-validation-datasets/
  
  Reply
Bipasha October 5, 2019 at 5:29 pm #

Sir thank you so much for this excellent tutorial.
Can you tell me how to run the model on a similar test dataset after training the model?

Reply
- Jason Brownlee October 6, 2019 at 8:15 am #
  
  Yes, you can use model.predict(), see examples here:
  https://machinelearningmastery.com/make-predictions-scikit-learn/
  
  Reply
sahil October 7, 2019 at 11:42 pm #

Great explanation, keep sharing your knowledge,
Thank you very much.

Reply
- Jason Brownlee October 8, 2019 at 8:03 am #
  
  Thanks!
  
  Reply
Milind Dalvi October 15, 2019 at 10:27 am #

Hello Jason,

If I were to create my own custom batches say within the model.fit_generator() method.

Do we create new randomly sampled batches for each epoch or do we just create batches at __init__ and use them without any changes throughout the training?

What’s the recommended way?

P.S. If I randomly sample batches each epoch I see spikes in val_acc, not sure it’s bcoz of that though!

Reply
- Jason Brownlee October 15, 2019 at 1:46 pm #
  
  Great question.
  
  It is important to ensure that each batch is representative (within reason), and that each epoch of batches is broadly representative of the problem.
  
  If not, you will push the weights all over the place or back/forth on each update not not generalize well.
  
  Reply
  - Milind Dalvi October 16, 2019 at 12:44 am #
    
    Hello Jason,
    
    Thank you for your response.
    
    I also just confirmed that Keras would separate the provided X in mini-batches only once before entering the epoch loop.
    
    Here is the link to code https://github.com/keras-team/keras/blob/f242c6421fe93468064441551cdab66e70f631d8/keras/engine/training_generator.py#L160
    
    Reply
    - Jason Brownlee October 16, 2019 at 8:06 am #
      
      Yes.
      
      Reply
Milind Dalvi October 17, 2019 at 5:03 am #

Good Morning Jason,

A question came in my mind today.

What happens while training a Neural Network in mini-batches when the class labels are imbalanced. Are we suppose to stratify the batches?

Becoz it seems like my NN is only predicting dominant class no matter what I do!

Reply
- Jason Brownlee October 17, 2019 at 6:42 am #
  
  Great question. We get bad times!
  
  Sometimes the experts would say to alternate classes in each batch. Sometimes stratify. It might be problem/model dependent. I’m thinking back to this book:
  https://machinelearningmastery.com/neural-networks-tricks-of-the-trade-review/
  
  Nevertheless, imbalanced data is a pain regardless of your update strategy. Oversampling the training set is a great solution.
  
  Reply
  - Milind Dalvi October 19, 2019 at 7:17 am #
    
    Thanks, Jason.
    
    I will surely take a look at the book.
    
    Btw I am actually in ranking business. So I got very few 1st and 2nd rankers but a lot of 3rd and above, somewhere as (10%, 10%, 80%) respectively.
    
    What I did is, I took a different perspective on the problem and converted my imbalanced multiclass dataset to an equalized binary dataset.
    
    I converted.
    
    Racing Car 1: 1st Rank
    Racing Car 2: 2nd Rank
    Racing Car 3: 3rd Rank
    Racing Car 4: 4th Rank
    
    to,
    
    Racing Car1, Racing Car2 = 0
    Racing Car2, Racing Car1 = 1
    Racing Car1, Racing Car3 = 0
    Racing Car3, Racing Car1 = 1
    Racing Car1, Racing Car4 = 0
    Racing Car4, Racing Car1 = 1
    Racing Car2, Racing Car3 = 0
    Racing Car3, Racing Car2 = 1
    Racing Car2, Racing Car4 = 0
    Racing Car4, Racing Car2 = 1
    
    and so on… where Target is now the winning side!
    
    Reply
    - Jason Brownlee October 20, 2019 at 6:12 am #
      
      Fascinating! Thanks for sharing.
      
      Reply
Thiyagarajan Paramadayalan November 15, 2019 at 6:18 pm #

very well explained, Jason. thanks.

Reply
- Jason Brownlee November 16, 2019 at 7:21 am #
  
  Thanks!
  
  Reply
Victor December 18, 2019 at 5:57 pm #

Well explained. Thanks.

Reply
- Jason Brownlee December 19, 2019 at 6:25 am #
  
  Thanks.
  
  Reply
Rajib Das December 29, 2019 at 12:15 pm #

HI Jason – I have a question – If I understood it correctly , the weights and bias are updated after running through the batch , so any change after the batch is run is applied to the next batch ? And it continues so on.

Reply
- Jason Brownlee December 30, 2019 at 5:56 am #
  
  Correct.
  
  Reply
Ei Ei Mon January 6, 2020 at 11:27 pm #

Well explained. Thanks Dr. Jason.

Reply
- Jason Brownlee January 7, 2020 at 7:23 am #
  
  Thanks!
  
  Reply
Ali January 12, 2020 at 8:38 am #

Thanks Dr. Jason

Reply
- Jason Brownlee January 13, 2020 at 8:16 am #
  
  You’re welcome.
  
  Reply
Yiding January 17, 2020 at 1:38 am #

Super straightforward and helpful. Thank you!

Reply
- Jason Brownlee January 17, 2020 at 6:03 am #
  
  Thanks, I’m happy it was useful.
  
  Reply
mingxing wang February 4, 2020 at 8:59 pm #

Hi. Sir.
I am very thankful to you.
Now I am in the middle of studying hands on machine learning and Part 2 in chapter 11 I can’t understand the meaning of batch. At first I think neural network must train by sample one by one. But they said “batch” and I can’t understand on earth.
But your article gives me a good sense about batch.
I understand batch completely with only one question.
How can I use gradient method with batch?
I mean in one sample it is understandable.
But with batch I don’t understand how to evaluate error.
Thank you.

Reply
- Jason Brownlee February 5, 2020 at 8:08 am #
  
  It does go one by one, but after “batch” number of samples the weights are updated with accumulated error.
  
  Reply
Moe February 17, 2020 at 9:32 pm #

Greetings, Dr. Brownlee

I was hoping you would be able to help me with my rather long confusing questions(sorry). I am very new to deep learning.

I do think I understand the definition for iteration, batch, and epoch but I am not so sure about them in regards to them in-practice.
So I will give an example and hope that you could help me that way.

Now, I (hopefully) understand that iteration is the parameter in which it will pass through a set of samples through and back the model where Epoch will pass through (and back) all of the samples.

Assuming I have a dataset of 50,000 points.

The following parameters are set in Python/Keras as.
batch_size = 64
iterations = 50
epoch = 35

So, my assumption on what the code is doing is as follows:

50,000 samples will be divided by the batch size (=781.25 =~ 781).
So now I have 64 blocks (batches) of the whole dataset, with each containing 781 samples.

For iteration 1:
All of the blocks from 1 to 64 will be passed through the model. Each block/batch resulting in its own accuracy metric, resulting in 64 accuracy numbers that will be averaged at the end.
This above process will be repeated 35 times (the number of Epochs) resulting in 35 averaged accuracies, and as the Epoch increases along the iteration, the accuracy is (theoretically) going to be better than the previous accuracy.
After the iteration is done, the weights of the nodes will be updated and be used for iteration 2.

The above process will be repeated 50 times, as I have 50 iterations.

Is what I said true so far?

That is my major confusion at the moment.

My other rather question is in regards to the accuracy and the 3 mentioned hyperparameters.

I have been playing around with the Addition RNN example over at Keras, where they set batch to 128, iterations to 200 and epochs to 1.

My question is, if you set batch to 2048, iterations to 4 with 50 epochs. Not only will you not reach an Accuracy of 0.999x at the end (you almost always reach this accuracy in other combinations of the parameters). However, your accuracy will actually dip substantially.
I have put the results in the following pastebin link [https://pastebin.com/gV1QKxH3]
and would like to bring your attention to Epoch 41/50 where the accuracy almost halved itself.
Is there any reason at all to this?
My only thought process is maybe the weights were somehow reseted but that seems extremely unlikely.

Thank you greatly for your time, as always

I hope to hear from you soon

Regards,
Moe

Reply
- Jason Brownlee February 18, 2020 at 6:20 am #
  
  What is an iteration?
  
  Reply
Mohamad Almustafa February 18, 2020 at 4:45 pm #

An iteration in deep learning, is when all of the batches are passed through the model.
The epochs will repeat this process (35 times).
At the end of this process, the model will be updated with new weights.

For iteration 2, the same process will happen again, but this time the model will be using its new weights from the previous iteration.

I hope this helped

Reply
- Jason Brownlee February 19, 2020 at 7:58 am #
  
  I would call it one epoch.
  
  Reply
Yves February 18, 2020 at 8:02 pm #

Nice explanation, thank you!
Just to make sure i understood:
if one would do a Batch GD, then one would not need any epoch, right?
Namely, it is really the different compositions of the mini-batches in each epoch, that make the epochs different, right?

Reply
- Jason Brownlee February 19, 2020 at 8:01 am #
  
  No. One epoch would equal one batch. Still need many epochs.
  
  Reply
mah.max March 26, 2020 at 3:37 am #

Thank you very much Jason. I saw that you used sometimes epoch in this way
model.fit(X_train,y_train,epochs=50) and sometimes in a for loop like this
for iter in range(50):
model.fit(X_train,y_train,epochs=1)
according to the definition of epoch in both cases, 50 times the learning algorithm will work through the entire training dataset. Is it correct? Are they doing the same? if not could you please tell me the difference?

Reply
- Jason Brownlee March 26, 2020 at 8:00 am #
  
  They are the same thing.
  
  The manual loop gives more control in case you want to do something each epoch.
  
  Reply
  - mah.max March 27, 2020 at 7:22 am #
    
    Thank you very much.
    
    Reply
    - Jason Brownlee March 27, 2020 at 8:03 am #
      
      You’re welcome.
      
      Reply
Akilu Rilwan March 29, 2020 at 10:46 pm #

Thank you Jason, you always save my search.

Reply
- Jason Brownlee March 30, 2020 at 5:33 am #
  
  You’re welcome.
  
  Reply
Rajrudra April 21, 2020 at 2:44 am #

Thanks , I had heard stochastic gradient descent but here, just with one line, you have cleared the basic concept. I am just a novice but this might be a good starting point

Reply
- Jason Brownlee April 21, 2020 at 6:02 am #
  
  Thanks, I’m happy it helped!
  
  Reply
Rajrudra April 21, 2020 at 2:45 am #

Also thanks for the batch concept

Reply
- Jason Brownlee April 21, 2020 at 6:03 am #
  
  You’re welcome.
  
  Reply
nishan April 21, 2020 at 7:21 pm #

How should epoch, batch size affect the weight? How can you describe the relation between

Reply
- Jason Brownlee April 22, 2020 at 5:53 am #
  
  What do you mean exactly, can you please elaborate?
  
  Reply
Mira April 21, 2020 at 7:29 pm #

How many times the weight update, does that depends on the batch_Size and number of epochs? or it should stop when it reaches the best weight?

Reply
- Jason Brownlee April 22, 2020 at 5:53 am #
  
  Yes, the number of times the weights are update depends on the batch size and epochs – this is mentioned in the tutorial.
  
  There is no best weight – we stop when the model stops improving or when we have run out of time.
  
  Reply
Sakshi April 23, 2020 at 5:09 pm #

Hi Sir

Thank you for helping us with your tutorials.
I just love your site. Thanks for making me a better data scientist 🙂

Reply
- Jason Brownlee April 24, 2020 at 5:37 am #
  
  Thanks!
  
  Reply
Ehsan April 25, 2020 at 12:10 am #

Thank you Jason for good explanation.
Please let me know about the following issue:
What happen when one bach feed to network. Error of each sample calculated, and then get the average of error of all samples, and then gradient descent use this average to update the weights or it works in another way?

Reply
- Jason Brownlee April 25, 2020 at 6:51 am #
  
  Something like that.
  https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/
  
  Reply
Behnaam April 26, 2020 at 2:57 am #

Jason, many thanks for the explanation. Is this statement correct?

Increasing the batch size makes the error surface smoother; so, the mini-batch gradient descent is preferable over the stochatic one. On the other hand, we might want to keep the batch size not so large so that the network has enough number of update chances (iterations) using the whole samples.

Reply
- Jason Brownlee April 26, 2020 at 6:18 am #
  
  Maybe.
  
  Sometimes we want a noisy estimate of the gradient to bounce around the parameter space and find a good/better set of parameters.
  
  Reply
Ricardo April 26, 2020 at 3:22 pm #

You are too much sir. Thank you.

Reply
- Jason Brownlee April 27, 2020 at 5:29 am #
  
  You’re welcome.
  
  Reply
Isic May 5, 2020 at 2:32 am #

Thank you for this amazing explanation.

I wanted to ask a question before I figured out the answer :)!
Actually, the question was, “why we need to go through the entire training dataset more than once ?”, and I think the answer is that, in the first epoch, weight are randomly initialized, but in the 2nd one, weight are already updated in every batch, and so on. In other words: the weights of the epoch t are “transferred” to the epoch t+1. This way, the learning curve of the training set is going down.

Please correct me if I’m wrang.

Reply
- Jason Brownlee May 5, 2020 at 6:33 am #
  
  Great question!
  
  The models learn iteratively, slowly. If we learn too fast, we over learn and cannot generalize well to new data. We – the community – have learned this over 40+ years.
  
  Reply
Raj May 14, 2020 at 6:58 pm #

Thank you so much for clarifying the concept of epoch and batch size, in very simple and easy terminology.

Reply
- Jason Brownlee May 15, 2020 at 5:57 am #
  
  You’re welcome!
  
  Reply
  - Riyaz December 28, 2020 at 9:25 am #
    
    Hello sir, thank you for your amazing posts.
    
    So to make things simpler for me, say I have a linear regression model and we are in epoch 3 (it has gone through the entire dataset twice and now it’s doing it a third time), we still have the same dataset but the only difference is that we have updated our parameters (the coefficient values) twice. When we updated it the first time, we developed a better model with coefficient more suited to the training data, then when in epoch 2, we used the updated model from epoch 1 and used that to train on the same dataset and then again use our model with updated coeffieicients from epoch 2 in epoch 3. So summing up, at each epoch we have a slightly better model than the previous one which allows us to lower down the error rate? Is my understanding right?
    
    Kind regards
    
    Reply
    - Jason Brownlee December 28, 2020 at 9:51 am #
      
      You’re welcome.
      
      Exactly!
      
      Reply
Sanjay Talbar May 16, 2020 at 5:15 pm #

excellent explanation!
My question is”For SGD, traing time will be more as compared to mini batch and batch gradient descent Algo?

Reply
- Jason Brownlee May 17, 2020 at 6:33 am #
  
  Batch gradient descent will be faster to execute than mini-batch gradient descent.
  
  Reply
arman May 21, 2020 at 10:05 am #

hi tanks for great content
sorry for asking irrelevant question
is steps_per_epoch in Imagedatagenerator just for saving time ?
thank you

Reply
- Jason Brownlee May 21, 2020 at 1:41 pm #
  
  No. From memory, it is a proxy for the number of samples in an epoch or the number of updates, I don’t recall which.
  
  Reply
Pravin Girase July 18, 2020 at 4:00 pm #

Jason

I have question about shuffelimg the data during training. What I have observed that if I run the same code multiple times the results are not the same ifbi am using shuffled data. So how do I get confidence that my code is correct when the accuracy and training losses keep changing.

So sometimes I end up fixing the training data set and validation data set. I want to know if this is correct practice. If not then how to believe that whatever results I am getting are good enough?

Reply
- Jason Brownlee July 19, 2020 at 6:24 am #
  
  Good question, you must evaluate your model many times and report the mean and standard deviation of the model’s performance.
  
  This will help:
  https://machinelearningmastery.com/faq/single-faq/why-do-i-get-different-results-each-time-i-run-the-code
  
  Reply
Omar September 2, 2020 at 12:42 am #

Jason, you are a great teacher. I like the way you explain.
I also like that despite trying to “de-academize” the teaching, you still put references to the original papers. You strike a fairly nice balance there.

Thanks a lot.

Omar

Reply
- Jason Brownlee September 2, 2020 at 6:30 am #
  
  Thanks Omar.
  
  Reply
John September 8, 2020 at 1:26 pm #

Good to know. Thanks for the clear explanation!

Reply
- Jason Brownlee September 8, 2020 at 1:39 pm #
  
  You’re welcome.
  
  Reply
Shivan October 11, 2020 at 1:22 am #

Hello sir
If we use four different pre-trained network, can we set different number of epoch for every networks?
For example: 15 epoch for Alexnet, 20 epoch for vgg16 and so on? I will make a comparison among these networks

Thanks

Reply
- Jason Brownlee October 11, 2020 at 6:53 am #
  
  Sure. Train them separately and save to file, then load them later for your ensemble. I have tutorials on this, search for deep learning ensemble.
  
  Reply
Unnikrishnan November 4, 2020 at 6:42 pm #

Thank You Jason.
Your articles on Machine Learning and Deep Learning are informative and great.
For me , its the go-to-site on these topics. Thanks.

Reply
- Jason Brownlee November 5, 2020 at 6:32 am #
  
  Thank you!
  
  Reply
saran November 5, 2020 at 9:05 pm #

what will be the maximin total possible number of epochs for your examples of 200 samples and batch size of 5 any standard formula

Reply
- Jason Brownlee November 6, 2020 at 5:54 am #
  
  Infinite. There’s no maximum. You train until the model stops improving.
  
  Reply
Mohammad Kasra Habib December 11, 2020 at 1:43 am #

Hi,

Regarding what you said “This means that the dataset will be divided into 40 batches, each with five samples. The model weights will be updated after each batch of five samples.”

It means that the loss is computed only when one batch passed through the net, and then the gradient update takes place.

So, how the loss is carried for 5 samples inside the batch?

E.g., my net is processing the 1st batch and my loss function is MAE. Basically, the neural network calculates MAE for each individual instance in the batch, then average it, and eventually pass it to the optimizer (in this example lets say it is SGD) and SGD multiplies it by the learning rate and subtract it from the net’s weights to accomplish the gradient update. Is this correct!

If it is correct, then the loss is not computed at the end of each epoch and it only specifies how many iterations should be done on each batch.

Thanks in advance!
I like your tutorials, they are really great. KEEP THEM UP!

Reply
- Jason Brownlee December 11, 2020 at 6:39 am #
  
  The loss is averaged over the five samples in the batch.
  
  Reply
Swetha Balaji December 14, 2020 at 4:47 pm #

Finally understood with the example stated above d/b epoch and Batch. Thank you

Reply
- Jason Brownlee December 15, 2020 at 6:16 am #
  
  I’m happy that it helped.
  
  Reply
SuSa December 21, 2020 at 2:48 am #

Dear Jason,

Thanks for the simple explanation. I had read so many articles on ANN, without any clarity on the subject. This suddenly made everything clear.

Reply
- Jason Brownlee December 21, 2020 at 6:40 am #
  
  You’re welcome, I’m happy that it helped.
  
  Reply
Giriraj Pawar January 11, 2021 at 2:43 am #

“an epoch that has one batch is called the batch gradient descent learning algorithm”.

Batch Gradient Descent. Batch Size = Size of Training Set
Stochastic Gradient Descent. Batch Size = 1
Mini-Batch Gradient Descent. 1 < Batch Size < Size of Training Set

as per the above explanation:
if Batch Size = 1 then it should be called Stochastic Gradient Descent, why it is being called batch gradient descent learning algorithm.

Reply
- Jason Brownlee January 11, 2021 at 6:20 am #
  
  Here we are saying if the batch size equals the entire training dataset, this is called “batch gradient descent”.
  
  If the batch size equals one row/sample, this is called “stochastic gradient descent”.
  
  Reply
Questioner January 18, 2021 at 4:04 pm #

If more Epochs make more learning, why don’t we set it to a large number (eg. 10,000 or 100,000) like that? So the result we be better and better

Reply
- Jason Brownlee January 19, 2021 at 6:34 am #
  
  Good question.
  
  Diminishing returns – e.g. no further improvement or even making the model worse (overfitting) after some point.
  
  Reply
Vesto January 21, 2021 at 9:45 pm #

Thank you for this great work Jason. I found myself asking that question today after using these terms for a while now!

Reply
- Jason Brownlee January 22, 2021 at 7:19 am #
  
  You’re welcome!
  
  Reply
Gabba February 10, 2021 at 5:13 am #

Amazing.
Thanks

Reply
- Jason Brownlee February 10, 2021 at 8:12 am #
  
  You’re welcome!
  
  Reply
MER February 15, 2021 at 3:26 pm #

Your articles are amazing, it’s so clear. Thank you so much

Reply
- Jason Brownlee February 15, 2021 at 4:01 pm #
  
  Thank you!
  
  Reply
Bilew February 27, 2021 at 7:25 pm #

it is a short and brief explanation thanks.

Reply
- Jason Brownlee February 28, 2021 at 4:33 am #
  
  You’re welcome!
  
  Reply
Nilu R Salim March 1, 2021 at 4:19 pm #

That’s why we go for early stopping, to avoid overfitting.

Reply
- Jason Brownlee March 2, 2021 at 5:42 am #
  
  Yes, if we have sufficient additional data, early stopping can be very effective:
  https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
  
  Reply
Ajoku Kingsley kelechi March 22, 2021 at 8:14 am #

Thank you Jason. Please does the weight get updated after each batch sample have been passed or after an epoch cycle have been made?

Reply
- Jason Brownlee March 23, 2021 at 4:53 am #
  
  Yes, model weights are updated after each batch.
  
  Reply
David Espinosa April 9, 2021 at 8:42 am #

One of the nicest summaries I have read so far about this subject!

Thank you very much Dr. Bronwlee!

Reply
- Jason Brownlee April 10, 2021 at 5:57 am #
  
  Thanks, I’m happy it helped.
  
  Reply
Shaima' May 24, 2021 at 8:37 pm #

Thanks a lot for your wonderful explanation.

Reply
- Jason Brownlee May 25, 2021 at 6:07 am #
  
  You’re welcome.
  
  Reply
Roshan Basnet June 16, 2021 at 4:35 am #

Very helpful content with clear explanation.

Reply
- Jason Brownlee June 16, 2021 at 6:22 am #
  
  Thanks.
  
  Reply
chirag palan June 23, 2021 at 6:11 am #

Befor googling this question, i thought I will be only one who is looking for “difference between epoch and batch size” but after looking at all the comments I was very much surprised.
But it is very clear now.
Thanks Jason

Reply
- Jason Brownlee June 24, 2021 at 5:55 am #
  
  I’m happy it helped!
  
  Reply
Taher August 13, 2021 at 12:01 pm #

Thanks a lot Jason! Very well explained. These terminologies are very confusing for beginners. Your article solved this mystery for me.

Reply
- Adrian Tam August 14, 2021 at 2:33 am #
  
  Thank you. Glad you like it.
  
  Reply
MS August 26, 2021 at 4:01 pm #

Thank you so much for this article. Your articles on machine learning are so easy to grasp and I always follow them.

Reply
- Adrian Tam August 27, 2021 at 6:00 am #
  
  Hope you enjoyed it! Thank you.
  
  Reply
Ahmed Hamadto September 2, 2021 at 6:35 pm #

Your articles are amazing, and a testament to your understanding of machine learning, your work is vital to the community and is very highly appreciated, please keep it up. I may not write on every article but know that each and every one of them is appreciated.

Thank you, once again.

Reply
- Jason Brownlee September 3, 2021 at 5:29 am #
  
  Thank you deeply!
  
  Reply
Aitha Sahith October 25, 2021 at 4:43 am #

Yeah it helped thanks!!!!

Reply
Nico H. November 5, 2021 at 2:01 pm #

It helped me a lot! Thank you! 🙂

Reply
- Adrian Tam November 7, 2021 at 8:03 am #
  
  Glad you liked it!
  
  Reply
Fahad November 6, 2021 at 9:01 pm #

Amazing blog! Explained the difference between the two so clearly.

Reply
Amit December 29, 2021 at 10:13 pm #

Great explanation, you nailed it..

Reply
- James Carmichael December 30, 2021 at 10:06 am #
  
  Thank you for the feedback and kind words Amit!
  
  Reply
Aditya February 5, 2022 at 1:46 am #

Dear Sir,
Thank you for the example at the end of this blog. I have a question regarding epochs. I am posing the question using the same example you used.

As per the example, we have 1000 epochs with each epoch having 40 batches. I am curious whether the composition of these batches are the same for each epoch or are is shuffled with every epoch.

Regards,
Aditya

Reply
- James Carmichael February 5, 2022 at 11:02 am #
  
  Hi Aditya…This can be controlled. The following discussion may be of interest:
  
  https://datascience.stackexchange.com/questions/53927/are-mini-batches-sampled-randomly-in-keras-sequential-fit-method
  
  Reply
Julie Smith May 9, 2022 at 10:34 am #

This is very helpful but I’m a little confused about batch size vs optimizers. I’m using pytorch code I got online and it uses a mini-batch gradient descent (i.e. they define a batch size of 128) and later in the code they call a SGD (stochastic gradient descent) optimizer. Can one use a mini batch gradient descent with a SGD optimizer? Is that common? Also, I was playing around with batch size and used a batch size of 1 (which is defined above as a stochastic gradient descent) and left the optimizer the same. Does this make sense to do?

Reply
- James Carmichael May 9, 2022 at 10:57 am #
  
  Hi Julie…the following is a great resource that may prove helpful:
  
  https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/
  
  Reply
ADUGE August 19, 2022 at 9:08 pm #

Waooo this is incisive and wonderful

Reply
Percy August 24, 2022 at 12:01 pm #

I understand. Thank you.

Reply
- James Carmichael August 25, 2022 at 7:46 am #
  
  You are very welcome Percy! Keep up the great work!
  
  Reply
ope April 3, 2023 at 7:13 pm #

straight forward and well explained thank you for this

Reply
- James Carmichael April 4, 2023 at 5:32 am #
  
  Thank you for your feedback ope! We greatly appreciate it!
  
  Reply
Rudi Ranck April 12, 2023 at 4:05 am #

I’d consider Jason Brownlee one of the main references of the field, and definitely the top 10 in didactics. The best when it comes to simplify AI content preserving everything that matters.

keep it up!

Reply
- James Carmichael April 12, 2023 at 7:37 am #
  
  Thank you Rudi for your support and feedback!
  
  Reply
Chengjin April 19, 2023 at 9:58 pm #

Thank you and very nice explaination!

Reply
- James Carmichael April 20, 2023 at 6:10 am #
  
  You are very welcome Chengjin! We appreciate your feedback.
  
  Reply
ARIFUL Islam May 30, 2023 at 6:20 pm #

How to training multiclass classification on mask rcnn algorithm? By tensorflow and keras step by step A to Z

Reply
- James Carmichael May 31, 2023 at 9:19 am #
  
  Hi ARIFUL…The following may be of interest to you:
  
  https://viso.ai/deep-learning/mask-r-cnn/
  
  Reply
Majid Liaquat June 15, 2023 at 8:14 pm #

Hi, May be I am wrong but I just want to clear is it typo or I am getting wrong: As you mentioned:

“When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.”

And in below its written: Mini-Batch Gradient Descent. 1 < Batch Size Batch Size < Size of Training Set ?

Thanks in Advance

Reply
- James Carmichael June 16, 2023 at 7:28 am #
  
  Hi Majid…There is no typo. The following may prove beneficial to add clarity:
  
  https://towardsdatascience.com/batch-mini-batch-stochastic-gradient-descent-7a62ecba642a
  
  Reply
Ali February 17, 2024 at 1:30 am #

thanks. Great

Reply
- James Carmichael February 17, 2024 at 9:55 am #
  
  You are very welcome Ali!
  
  Reply

Navigation