[New Book] Click to get The Beginner's Guide to Data Science!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

A Gentle Introduction to Pooling Layers for Convolutional Neural Networks

Convolutional layers in a convolutional neural network summarize the presence of features in an input image.

A problem with the output feature maps is that they are sensitive to the location of the features in the input. One approach to address this sensitivity is to down sample the feature maps. This has the effect of making the resulting down sampled feature maps more robust to changes in the position of the feature in the image, referred to by the technical phrase “local translation invariance.”

Pooling layers provide an approach to down sampling feature maps by summarizing the presence of features in patches of the feature map. Two common pooling methods are average pooling and max pooling that summarize the average presence of a feature and the most activated presence of a feature respectively.

In this tutorial, you will discover how the pooling operation works and how to implement it in convolutional neural networks.

After completing this tutorial, you will know:

  • Pooling is required to down sample the detection of features in feature maps.
  • How to calculate and implement average and maximum pooling in a convolutional neural network.
  • How to use global pooling in a convolutional neural network.

Kick-start your project with my new book Deep Learning for Computer Vision, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Gentle Introduction to Pooling Layers for Convolutional Neural Networks

A Gentle Introduction to Pooling Layers for Convolutional Neural Networks
Photo by Nicholas A. Tonelli, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. Pooling
  2. Detecting Vertical Lines
  3. Average Pooling Layers
  4. Max Pooling Layers
  5. Global Pooling Layers

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Pooling Layers

Convolutional layers in a convolutional neural network systematically apply learned filters to input images in order to create feature maps that summarize the presence of those features in the input.

Convolutional layers prove very effective, and stacking convolutional layers in deep models allows layers close to the input to learn low-level features (e.g. lines) and layers deeper in the model to learn high-order or more abstract features, like shapes or specific objects.

A limitation of the feature map output of convolutional layers is that they record the precise position of features in the input. This means that small movements in the position of the feature in the input image will result in a different feature map. This can happen with re-cropping, rotation, shifting, and other minor changes to the input image.

A common approach to addressing this problem from signal processing is called down sampling. This is where a lower resolution version of an input signal is created that still contains the large or important structural elements, without the fine detail that may not be as useful to the task.

Down sampling can be achieved with convolutional layers by changing the stride of the convolution across the image. A more robust and common approach is to use a pooling layer.

A pooling layer is a new layer added after the convolutional layer. Specifically, after a nonlinearity (e.g. ReLU) has been applied to the feature maps output by a convolutional layer; for example the layers in a model may look as follows:

  1. Input Image
  2. Convolutional Layer
  3. Nonlinearity
  4. Pooling Layer

The addition of a pooling layer after the convolutional layer is a common pattern used for ordering layers within a convolutional neural network that may be repeated one or more times in a given model.

The pooling layer operates upon each feature map separately to create a new set of the same number of pooled feature maps.

Pooling involves selecting a pooling operation, much like a filter to be applied to feature maps. The size of the pooling operation or filter is smaller than the size of the feature map; specifically, it is almost always 2×2 pixels applied with a stride of 2 pixels.

This means that the pooling layer will always reduce the size of each feature map by a factor of 2, e.g. each dimension is halved, reducing the number of pixels or values in each feature map to one quarter the size. For example, a pooling layer applied to a feature map of 6×6 (36 pixels) will result in an output pooled feature map of 3×3 (9 pixels).

The pooling operation is specified, rather than learned. Two common functions used in the pooling operation are:

  • Average Pooling: Calculate the average value for each patch on the feature map.
  • Maximum Pooling (or Max Pooling): Calculate the maximum value for each patch of the feature map.

The result of using a pooling layer and creating down sampled or pooled feature maps is a summarized version of the features detected in the input. They are useful as small changes in the location of the feature in the input detected by the convolutional layer will result in a pooled feature map with the feature in the same location. This capability added by pooling is called the model’s invariance to local translation.

In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not change.

— Page 342, Deep Learning, 2016.

Now that we are familiar with the need and benefit of pooling layers, let’s look at some specific examples.

Detecting Vertical Lines

Before we look at some examples of pooling layers and their effects, let’s develop a small example of an input image and convolutional layer to which we can later add and evaluate pooling layers.

In this example, we define a single input image or sample that has one channel and is an 8 pixel by 8 pixel square with all 0 values and a two-pixel wide vertical line in the center.

Next, we can define a model that expects input samples to have the shape (8, 8, 1) and has a single hidden convolutional layer with a single filter with the shape of 3 pixels by 3 pixels.

A rectified linear activation function, or ReLU for short, is then applied to each value in the feature map. This is a simple and effective nonlinearity, that in this case will not change the values in the feature map, but is present because we will later add subsequent pooling layers and pooling is added after the nonlinearity applied to the feature maps, e.g. a best practice.

The filter is initialized with random weights as part of the initialization of the model.

Instead, we will hard code our own 3×3 filter that will detect vertical lines. That is the filter will strongly activate when it detects a vertical line and weakly activate when it does not. We expect that by applying this filter across the input image that the output feature map will show that the vertical line was detected.

Next, we can apply the filter to our input image by calling the predict() function on the model.

The result is a four-dimensional output with one batch, a given number of rows and columns, and one filter, or [batch, rows, columns, filters]. We can print the activations in the single feature map to confirm that the line was detected.

Tying all of this together, the complete example is listed below.

Running the example first summarizes the structure of the model.

Of note is that the single hidden convolutional layer will take the 8×8 pixel input image and will produce a feature map with the dimensions of 6×6.

We can also see that the layer has 10 parameters: that is nine weights for the filter (3×3) and one weight for the bias.

Finally, the single feature map is printed.

We can see from reviewing the numbers in the 6×6 matrix that indeed the manually specified filter detected the vertical line in the middle of our input image.

We can now look at some common approaches to pooling and how they impact the output feature maps.

Average Pooling Layer

On two-dimensional feature maps, pooling is typically applied in 2×2 patches of the feature map with a stride of (2,2).

Average pooling involves calculating the average for each patch of the feature map. This means that each 2×2 square of the feature map is down sampled to the average value in the square.

For example, the output of the line detector convolutional filter in the previous section was a 6×6 feature map. We can look at applying the average pooling operation to the first line of that feature map manually.

The first line for pooling (first two rows and six columns) of the output feature map were as follows:

The first pooling operation is applied as follows:

Given the stride of two, the operation is moved along two columns to the left and the average is calculated:

Again, the operation is moved along two columns to the left and the average is calculated:

That’s it for the first line of pooling operations. The result is the first line of the average pooling operation:

Given the (2,2) stride, the operation would then be moved down two rows and back to the first column and the process continued.

Because the downsampling operation halves each dimension, we will expect the output of pooling applied to the 6×6 feature map to be a new 3×3 feature map. Given the horizontal symmetry of the feature map input, we would expect each row to have the same average pooling values. Therefore, we would expect the resulting average pooling of the detected line feature map from the previous section to look as follows:

We can confirm this by updating the example from the previous section to use average pooling.

This can be achieved in Keras by using the AveragePooling2D layer. The default pool_size (e.g. like the kernel size or filter size) of the layer is (2,2) and the default strides is None, which in this case means using the pool_size as the strides, which will be (2,2).

The complete example with average pooling is listed below.

Running the example first summarizes the model.

We can see from the model summary that the input to the pooling layer will be a single feature map with the shape (6,6) and that the output of the average pooling layer will be a single feature map with each dimension halved, with the shape (3,3).

Applying the average pooling results in a new feature map that still detects the line, although in a down sampled manner, exactly as we expected from calculating the operation manually.

Average pooling works well, although it is more common to use max pooling.

Max Pooling Layer

Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.

The results are down sampled or pooled feature maps that highlight the most present feature in the patch, not the average presence of the feature in the case of average pooling. This has been found to work better in practice than average pooling for computer vision tasks like image classification.

In a nutshell, the reason is that features tend to encode the spatial presence of some pattern or concept over the different tiles of the feature map (hence, the term feature map), and it’s more informative to look at the maximal presence of different features than at their average presence.

— Page 129, Deep Learning with Python, 2017.

We can make the max pooling operation concrete by again applying it to the output feature map of the line detector convolutional operation and manually calculate the first row of the pooled feature map.

The first line for pooling (first two rows and six columns) of the output feature map were as follows:

The first max pooling operation is applied as follows:

Given the stride of two, the operation is moved along two columns to the left and the max is calculated:

Again, the operation is moved along two columns to the left and the max is calculated:

That’s it for the first line of pooling operations.

The result is the first line of the max pooling operation:

Again, given the horizontal symmetry of the feature map provided for pooling, we would expect the pooled feature map to look as follows:

It just so happens that the chosen line detector image and feature map produce the same output when downsampled with average pooling and maximum pooling.

The maximum pooling operation can be added to the worked example by adding the MaxPooling2D layer provided by the Keras API.

The complete example of vertical line detection with max pooling is listed below.

Running the example first summarizes the model.

We can see, as we might expect by now, that the output of the max pooling layer will be a single feature map with each dimension halved, with the shape (3,3).

Applying the max pooling results in a new feature map that still detects the line, although in a down sampled manner.

Global Pooling Layers

There is another type of pooling that is sometimes used called global pooling.

Instead of down sampling patches of the input feature map, global pooling down samples the entire feature map to a single value. This would be the same as setting the pool_size to the size of the input feature map.

Global pooling can be used in a model to aggressively summarize the presence of a feature in an image. It is also sometimes used in models as an alternative to using a fully connected layer to transition from feature maps to an output prediction for the model.

Both global average pooling and global max pooling are supported by Keras via the GlobalAveragePooling2D and GlobalMaxPooling2D classes respectively.

For example, we can add global max pooling to the convolutional model used for vertical line detection.

The outcome will be a single value that will summarize the strongest activation or presence of the vertical line in the input image.

The complete code listing is provided below.

Running the example first summarizes the model

We can see that, as expected, the output of the global pooling layer is a single value that summarizes the presence of the feature in the single feature map.

Next, the output of the model is printed showing the effect of global max pooling on the feature map, printing the single largest activation.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Posts

Books

API

Summary

In this tutorial, you discovered how the pooling operation works and how to implement it in convolutional neural networks.

Specifically, you learned:

  • Pooling is required to down sample the detection of features in feature maps.
  • How to calculate and implement average and maximum pooling in a convolutional neural network.
  • How to use global pooling in a convolutional neural network.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Deep Learning Models for Vision Today!

Deep Learning for Computer Vision

Develop Your Own Vision Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Deep Learning for Computer Vision

It provides self-study tutorials on topics like:
classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more...

Finally Bring Deep Learning to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

68 Responses to A Gentle Introduction to Pooling Layers for Convolutional Neural Networks

  1. Avatar
    Bejoscha April 26, 2019 at 7:27 am #

    Thanks. An interesting read.

  2. Avatar
    jamila May 9, 2019 at 10:39 pm #

    I do not understand how global pooling works in coding results. please help

  3. Avatar
    jamila May 10, 2019 at 5:34 pm #

    I’m focusing on results. how it gives us a single value?

  4. Avatar
    Justin June 14, 2019 at 11:09 pm #

    Excellent article, thank you so much for writing it. It could be helpful to create a slight variation of your examples where average and max pooling produce different results :).

  5. Avatar
    LELA June 21, 2019 at 2:31 am #

    Case:1. if we apply average pooling then it will need to place all FC-layers and then softmax?
    Case2: if we apply the average pooling then it will need to feed the resulting vector directly into softmax?

    Case3: the sequence will look correct.. features maps – avr pooling – softmax? OR features map – avr pooling – FC-layers – Softmax?

    Case3: can we say that the services of average pooling can be achieved through GAP?

    Case4: in case of multi-CNN, how we will concatenate the features maps into the average pooling

    • Avatar
      Jason Brownlee June 21, 2019 at 6:40 am #

      Not sure I agree, they are all options, not requirements.

      What are you getting at exactly?

      • Avatar
        LELA June 21, 2019 at 2:44 pm #

        I am asking for classification/recognition when multiple CNNs are used.

        so, what will be the proper sequence to place all the operations what I mentioned above?

        Because, it is mentioned in the GAP research article, that when it is used then no need

        to use FC-layers. so what is the case in the average pool layer?

        (1): if we want to use CNN for images (classification/recognition task), can we use

        softmax classifier directly after the Average Pool Layer (skip the fully-connected layers)?

        (2): OR for classification/recognition for any input image, can we place FC-Layers after

        Average pool layer and Then Softmax?

        And the last query, for image classification/recognition, what will be the right option when

        multiple-CNN are used to extract the features from the images,

        Option 1: Average pooling layer or GAP
        Option2: Average pooling layer + Softmax?
        Option3: Average pooling layer + FC-layers+ Softmax?
        Option4: Features Maps + GAP?
        Option5: Features Maps + GAP + FC-layers + Softmax?

        Why I am asking in details because I read from multiple sources, but it was not quite clear that what exactly the proper procedure should be used, also, after reading I feel that average pooling and GAP can provide the same services.

        • Avatar
          Jason Brownlee June 22, 2019 at 6:28 am #

          There is no single best way. There are no rules and models differ, it is a good idea to experiment to see what works best for your specific dataset.

          You can use use a softmax after global pooling or a dense layer, or just a dense layer and no global pooling, or many other combinations.

          It might be a good idea to look at the architecture of some well performing models like vgg, resnet, inception and try their proposed architecture in your model to see how it compares. or to get ideas.

  6. Avatar
    Rango September 1, 2019 at 1:35 pm #

    Great Article!!!

  7. Avatar
    JustVenky September 20, 2019 at 3:22 pm #

    can we use random forests for pooling

  8. Avatar
    RoyHJ November 2, 2019 at 11:08 pm #

    Thank you for the clear definitions and nice examples.

    A couple of questions about using global pooling at the end of a CNN model (before the fully connected as e.g. resnet):

    What would you say are the advantages/disadvantages of using global avg pooling vs global max pooling as a final layer of the feature extraction (are there cases where max would be prefered)?

    When switching between the two, how does it affect hyper parameters such as learning rate and weight regularization? (since max doesn’t pass gradients through all of the features, opposed to avg?)

    You wrote: “Global pooling can be used in a model to aggressively summarize the presence of a feature in an image. It is also sometimes used in models **as an alternative** to using a fully connected layer to transition from feature maps to an output prediction for the model.”

    Wouldn’t it be more accurate to say that (usually in the cnn domain) global pooling is sometimes added *before* (i.e. in addition) a fully connected (fc) layer in the transition from feature maps to an output prediction for the model (both giving the features global attention and reducing computation of the fc layer)?

    In order for global pooling to replace the last fc layer, you would need to equalize the number of channels to the number of classes first (e.g. 1×1 conv?), this would be heavier (computationally-wise) and a somewhat different operation than adding a fc after the global pool (e.g. as it’s done in common cnn models with a final global pooling layer). Is this actually ever done this way?

    • Avatar
      Jason Brownlee November 3, 2019 at 5:58 am #

      Thanks.

      You could probable construct post hoc arguments about the differences. I’d recommend testing them both and using results to guide you.

      No, global pooling is used instead of a fully connected layer – they are used as output layers. Inspect some of the classical models to confirm.

      It does, they output a vector.

  9. Avatar
    AH December 10, 2019 at 11:55 pm #

    Thanks, it is really nice explanation of pooling. Very readable and informative thanks to the examples.

  10. Avatar
    Vinay January 26, 2020 at 12:47 am #

    What does the below sentence about pooling layers mean?

    “This means that small movements in the position of the feature in the input image will result in a different feature map.

    • Avatar
      Jason Brownlee January 26, 2020 at 5:19 am #

      It means that slightly different images that look the same to our eyes look very diffrent to the model.

      • Avatar
        Amatul Saboor July 20, 2020 at 4:37 pm #

        Then how does it recognize an image as a dog that does have a dog in it but not in the center? This means those huge movements in the position of the dog’s feature in the input image will look very much different to the model.

        • Avatar
          Jason Brownlee July 21, 2020 at 5:53 am #

          Yes, a property of the CNN architecture is that it is invariant to the position of features in the input, e.g. if the model knows what a dog it, then the dog can appear almost anywhere in any position and still be detected correctly (within the limits).

          • Avatar
            Amatul Saboor July 22, 2020 at 4:30 am #

            Yes, I understand. My question is how a CNN is invariant to the position of features in the input? With the pooling layers, only the problem of a slight difference in the input can be solved (as you mentioned above). Then how this big difference in position (from the center to the corner) is solved?? Do we have any other type of layer to do this?

          • Avatar
            Jason Brownlee July 22, 2020 at 5:45 am #

            The conv and pooling layers when stacked achieve feature invariance together.

            Perhaps I don’t understand your question.

  11. Avatar
    Shailu February 21, 2020 at 7:17 am #

    Hello Jason, I am working on training convolutional neural network through transfer learning. I want to find the mean of the inter-class standard deviation for each convolutional layer to identify the best convolutional layer to freeze. Any help would be appreciated?

    • Avatar
      Jason Brownlee February 21, 2020 at 8:32 am #

      Not sure I follow, sorry. Perhaps post your question to stackoverflow?

  12. Avatar
    Sagnik February 26, 2020 at 8:10 pm #

    Hi Jason
    Great post!
    One of the frequently asked questions is why do we need a pooling operation after convolution in a CNN. The fact that you highlighted, making the image detector translation-invariant, is a very important point.
    Are there methods to make the detector rotation-invariant as well?
    Thanks

    • Avatar
      Jason Brownlee February 27, 2020 at 5:43 am #

      Thanks.

      Yes, train with rotated versions of the images. This is called data augmentation.

  13. Avatar
    Chinmay March 7, 2020 at 6:28 pm #

    Hi,

    Thanks for the amazing post.

    I have one doubt. I am building my own CNN and i am using max pooling. I did understand the forward propagation from the explanation.

    I was wondering about backward propagation, we save the index value of the maximum and insert ‘1’ for that index. But for the example you showed, it has all values as same. example ‘0’ in the first 2 x 2 cell. So do we insert ‘1’ for all the zeros here or any random ‘0’. Similarly if have 2 x 2 cell which has all the same value(0.9). So again do we insert ‘1’ for all the same value of ‘0.9’ or random.

    Thanks again for the post

    • Avatar
      Jason Brownlee March 8, 2020 at 6:08 am #

      Thanks.

      Sorry, I don’t quite follow your question. Perhaps you can rephrase it?

  14. Avatar
    Chinmay Appa Rane March 10, 2020 at 10:52 am #

    Hi,

    Thanks for your reply.

    Sorry for confusion. i was wondering about the backpropagation for the Max pooling example you mentioned.

    [0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
    [0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

    the forward propagation for above matrix is,

    [0.0, 3.0, 0.0]

    So, is the derivative of the matrix(i.e ‘1’ to the largest value we picked during forward propagation)

    But if all the values of the 2 x 2 matrix for pooling are same

    Is it ‘1’ for any random value of ‘3.0’ i.e maximum
    [0.0, 0.0, 1.0, 0.0, 0.0, 0.0]
    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

    or

    ‘1’ for all the maximum values
    [0.0, 0.0, 1.0, 1.0, 0.0, 0.0]
    [0.0, 0.0, 1.0, 1.0, 0.0, 0.0]

    ***Also, i assume for all zeros the derivative is ‘0’(not sure)

    Thanks again

    Chinmay

    • Avatar
      Jason Brownlee March 10, 2020 at 1:39 pm #

      Pooling layers do not have any weights, e.g. they are not involved in the learning.

      • Avatar
        Chinmay Appa Rane March 11, 2020 at 4:42 pm #

        Thank you for your reply. I was confused about the same as i read some CNN posts that we need to save the index numbers of the maximum values we choose after pooling. the post didn’t mentioned properly the use of saving the index values so i assumed they are used during back propagation.

        Thank you for clearing the doubt

  15. Avatar
    Blane May 14, 2020 at 12:15 am #

    Interesting, but it would be simpler and more useful if you just used an eight by eight pixel image and showed the outputs. this is too abstract for concepts which are already abstract

  16. Avatar
    Dhruv September 11, 2020 at 4:50 am #

    Hello sir, the tutorial was amazing, but I had a doubt.

    In the starting of the tutorial, you said “This means that small movements in the position of the feature in the input image will result in a different feature map”. Why do we even care if it’s a different feature map because it would still have all of it’s features in it as the previous time, but just at a different location now.

    Eg: Imagine, we have a kernel that detects ‘lips’, we trained it on images of lips, where in all images, the lips were present in the center of the image. After training, we would have a kernel that could detect lips. Now if we show an image where lips is present at the top right, it would still do a good job because it is a kernel that detects lips.

    So, even if the location of the features in the feature map changes, the CNN should still do a good job.

    So, why do we care if it’s a different feature map, when it still contains all the same features, but at a different location?

    • Avatar
      Jason Brownlee September 11, 2020 at 6:03 am #

      Thanks!

      Great question. We care because the model will extract different features – making the data inconsistent when in fact it is consistent. This makes learning harder and model performance worse.

      • Avatar
        Dhruv September 11, 2020 at 11:49 pm #

        Thank you for your reply.

        By ‘different features’, do you mean that the model will extract different sets of features for an image that has been changed a little from the one with no change?

        • Avatar
          Jason Brownlee September 12, 2020 at 6:16 am #

          Yes, rotated versions of the same image might mean extracting different features.

          If we use pooling we may achieve some rotation invariance in feature extraction.

  17. Avatar
    Dhruv September 12, 2020 at 1:54 pm #

    ahh I see. Thank you. You really are a master of machine learning.

  18. Avatar
    Gabriel October 1, 2020 at 3:48 am #

    Hello Jason! Thanks for all the tutorials you have done! I am new to Data Science and I am studying it on my own, so your posts have been really, really useful to me.

    I have one question, though. Is there any situation you would not recommend using pooling layers in a CNN? Or are they one of those things that “it never hurts to have one”?

    • Avatar
      Jason Brownlee October 1, 2020 at 6:32 am #

      Not really. If you are unsure for your model, compare performance with and without the layers and use whatever results in the best performance.

  19. Avatar
    Paweł December 22, 2020 at 5:37 am #

    Hi,
    Trying to wrap my head around it and understand a bit more how ccn like yolo works, what I kind of get is the convolution part – in another words the detecting and categorisation, but i dont really get how such networks marks detected subjects by drawing border around them.

    This probably is far more complicated 😉 but maybe you can push me in some direction.

  20. Avatar
    Anuj February 22, 2021 at 8:37 am #

    Sir, are your books written and updated in
    tensorflow 2.0
    ….because I’m planning to buy Deep learning.Please reply sir.????

  21. Avatar
    Hosein February 22, 2021 at 6:57 pm #

    Hello Janson,

    I wonder which one is better after some ResBlocks for a grayscale image, global sum pooling, or global max pooling?

    tnx,
    Hosein

    • Avatar
      Jason Brownlee February 23, 2021 at 6:18 am #

      Perhaps test and compare on your specific dataset.

  22. Avatar
    shimaa February 25, 2021 at 9:15 pm #

    i want to ask if the pooling don’t affect the back propagation (derivatives) calculations ?

    • Avatar
      shimaa February 25, 2021 at 9:19 pm #

      pooling layers * i mean

    • Avatar
      Jason Brownlee February 26, 2021 at 4:58 am #

      Yes, it changes the structure of the data flow at that point of the network.

  23. Avatar
    Hind Almisbahi April 22, 2021 at 12:12 am #

    Thank you so much Jason. Your articles are really helpful to get clear intuition. Any time I need to understand a concept in machine learning, I search first in your articles.

  24. Avatar
    Yash raj July 7, 2021 at 4:08 am #

    In the ‘Detecting vertical lines’ code, data.reshape(1, 8, 8, 1) has 4 parameters. I understood (8, 8, 1) meant 8 pixels by 8 pixels and 1 channel, but what is the purpose of the other ‘1’.
    P.s. please go easy on me I’m new to AI :).

    • Avatar
      Jason Brownlee July 7, 2021 at 5:35 am #

      It is 1 sample, e.g. one image. We often work with many images in a batch. It’s more efficient.

  25. Avatar
    Najeh November 26, 2021 at 6:05 am #

    How to perform global sum pooling in pytorch (with and without the view() function). Thank you

    • Avatar
      Adrian Tam November 29, 2021 at 8:28 am #

      Use average pooling instead. It is same as sum pooling with a constant scaling factor.

  26. Avatar
    Erfan December 15, 2021 at 4:07 am #

    why max pooling is better??!

    • Avatar
      Adrian Tam December 15, 2021 at 7:24 am #

      Not always better. It just experimentally found to be better in computer vision tasks.

  27. Avatar
    H March 6, 2022 at 7:57 am #

    Hello
    Can you give me the MATLAB code for average pooling?

    • Avatar
      James Carmichael March 6, 2022 at 1:03 pm #

      Hi H…I do not have tutorials in Octave or Matlab.

      I believe Octave and Matlab are excellent platforms for learning how machine learning algorithms work in an academic setting.

      I do not think that they are good platforms for applied machine learning in industry, which is the focus of my website.

Leave a Reply