Last Updated on July 5, 2019
Convolutional layers in a convolutional neural network summarize the presence of features in an input image.
A problem with the output feature maps is that they are sensitive to the location of the features in the input. One approach to address this sensitivity is to down sample the feature maps. This has the effect of making the resulting down sampled feature maps more robust to changes in the position of the feature in the image, referred to by the technical phrase “local translation invariance.”
Pooling layers provide an approach to down sampling feature maps by summarizing the presence of features in patches of the feature map. Two common pooling methods are average pooling and max pooling that summarize the average presence of a feature and the most activated presence of a feature respectively.
In this tutorial, you will discover how the pooling operation works and how to implement it in convolutional neural networks.
After completing this tutorial, you will know:
- Pooling is required to down sample the detection of features in feature maps.
- How to calculate and implement average and maximum pooling in a convolutional neural network.
- How to use global pooling in a convolutional neural network.
Kick-start your project with my new book Deep Learning for Computer Vision, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.

A Gentle Introduction to Pooling Layers for Convolutional Neural Networks
Photo by Nicholas A. Tonelli, some rights reserved.
Tutorial Overview
This tutorial is divided into five parts; they are:
- Pooling
- Detecting Vertical Lines
- Average Pooling Layers
- Max Pooling Layers
- Global Pooling Layers
Want Results with Deep Learning for Computer Vision?
Take my free 7-day email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Pooling Layers
Convolutional layers in a convolutional neural network systematically apply learned filters to input images in order to create feature maps that summarize the presence of those features in the input.
Convolutional layers prove very effective, and stacking convolutional layers in deep models allows layers close to the input to learn low-level features (e.g. lines) and layers deeper in the model to learn high-order or more abstract features, like shapes or specific objects.
A limitation of the feature map output of convolutional layers is that they record the precise position of features in the input. This means that small movements in the position of the feature in the input image will result in a different feature map. This can happen with re-cropping, rotation, shifting, and other minor changes to the input image.
A common approach to addressing this problem from signal processing is called down sampling. This is where a lower resolution version of an input signal is created that still contains the large or important structural elements, without the fine detail that may not be as useful to the task.
Down sampling can be achieved with convolutional layers by changing the stride of the convolution across the image. A more robust and common approach is to use a pooling layer.
A pooling layer is a new layer added after the convolutional layer. Specifically, after a nonlinearity (e.g. ReLU) has been applied to the feature maps output by a convolutional layer; for example the layers in a model may look as follows:
- Input Image
- Convolutional Layer
- Nonlinearity
- Pooling Layer
The addition of a pooling layer after the convolutional layer is a common pattern used for ordering layers within a convolutional neural network that may be repeated one or more times in a given model.
The pooling layer operates upon each feature map separately to create a new set of the same number of pooled feature maps.
Pooling involves selecting a pooling operation, much like a filter to be applied to feature maps. The size of the pooling operation or filter is smaller than the size of the feature map; specifically, it is almost always 2×2 pixels applied with a stride of 2 pixels.
This means that the pooling layer will always reduce the size of each feature map by a factor of 2, e.g. each dimension is halved, reducing the number of pixels or values in each feature map to one quarter the size. For example, a pooling layer applied to a feature map of 6×6 (36 pixels) will result in an output pooled feature map of 3×3 (9 pixels).
The pooling operation is specified, rather than learned. Two common functions used in the pooling operation are:
- Average Pooling: Calculate the average value for each patch on the feature map.
- Maximum Pooling (or Max Pooling): Calculate the maximum value for each patch of the feature map.
The result of using a pooling layer and creating down sampled or pooled feature maps is a summarized version of the features detected in the input. They are useful as small changes in the location of the feature in the input detected by the convolutional layer will result in a pooled feature map with the feature in the same location. This capability added by pooling is called the model’s invariance to local translation.
In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not change.
— Page 342, Deep Learning, 2016.
Now that we are familiar with the need and benefit of pooling layers, let’s look at some specific examples.
Detecting Vertical Lines
Before we look at some examples of pooling layers and their effects, let’s develop a small example of an input image and convolutional layer to which we can later add and evaluate pooling layers.
In this example, we define a single input image or sample that has one channel and is an 8 pixel by 8 pixel square with all 0 values and a two-pixel wide vertical line in the center.
1 2 3 4 5 6 7 8 9 10 11 |
# define input data data = [[0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0]] data = asarray(data) data = data.reshape(1, 8, 8, 1) |
Next, we can define a model that expects input samples to have the shape (8, 8, 1) and has a single hidden convolutional layer with a single filter with the shape of 3 pixels by 3 pixels.
A rectified linear activation function, or ReLU for short, is then applied to each value in the feature map. This is a simple and effective nonlinearity, that in this case will not change the values in the feature map, but is present because we will later add subsequent pooling layers and pooling is added after the nonlinearity applied to the feature maps, e.g. a best practice.
1 2 3 4 5 |
# create model model = Sequential() model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1))) # summarize model model.summary() |
The filter is initialized with random weights as part of the initialization of the model.
Instead, we will hard code our own 3×3 filter that will detect vertical lines. That is the filter will strongly activate when it detects a vertical line and weakly activate when it does not. We expect that by applying this filter across the input image that the output feature map will show that the vertical line was detected.
1 2 3 4 5 6 7 |
# define a vertical line detector detector = [[[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]]] weights = [asarray(detector), asarray([0.0])] # store the weights in the model model.set_weights(weights) |
Next, we can apply the filter to our input image by calling the predict() function on the model.
1 2 |
# apply filter to input data yhat = model.predict(data) |
The result is a four-dimensional output with one batch, a given number of rows and columns, and one filter, or [batch, rows, columns, filters]. We can print the activations in the single feature map to confirm that the line was detected.
1 2 3 4 |
# enumerate rows for r in range(yhat.shape[1]): # print each column in the row print([yhat[0,r,c,0] for c in range(yhat.shape[2])]) |
Tying all of this together, the complete example is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# example of vertical line detection with a convolutional layer from numpy import asarray from keras.models import Sequential from keras.layers import Conv2D # define input data data = [[0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0]] data = asarray(data) data = data.reshape(1, 8, 8, 1) # create model model = Sequential() model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1))) # summarize model model.summary() # define a vertical line detector detector = [[[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]]] weights = [asarray(detector), asarray([0.0])] # store the weights in the model model.set_weights(weights) # apply filter to input data yhat = model.predict(data) # enumerate rows for r in range(yhat.shape[1]): # print each column in the row print([yhat[0,r,c,0] for c in range(yhat.shape[2])]) |
Running the example first summarizes the structure of the model.
Of note is that the single hidden convolutional layer will take the 8×8 pixel input image and will produce a feature map with the dimensions of 6×6.
We can also see that the layer has 10 parameters: that is nine weights for the filter (3×3) and one weight for the bias.
Finally, the single feature map is printed.
We can see from reviewing the numbers in the 6×6 matrix that indeed the manually specified filter detected the vertical line in the middle of our input image.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 6, 6, 1) 10 ================================================================= Total params: 10 Trainable params: 10 Non-trainable params: 0 _________________________________________________________________ [0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0] |
We can now look at some common approaches to pooling and how they impact the output feature maps.
Average Pooling Layer
On two-dimensional feature maps, pooling is typically applied in 2×2 patches of the feature map with a stride of (2,2).
Average pooling involves calculating the average for each patch of the feature map. This means that each 2×2 square of the feature map is down sampled to the average value in the square.
For example, the output of the line detector convolutional filter in the previous section was a 6×6 feature map. We can look at applying the average pooling operation to the first line of that feature map manually.
The first line for pooling (first two rows and six columns) of the output feature map were as follows:
1 2 |
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0] |
The first pooling operation is applied as follows:
1 2 |
average(0.0, 0.0) = 0.0 0.0, 0.0 |
Given the stride of two, the operation is moved along two columns to the left and the average is calculated:
1 2 |
average(3.0, 3.0) = 3.0 3.0, 3.0 |
Again, the operation is moved along two columns to the left and the average is calculated:
1 2 |
average(0.0, 0.0) = 0.0 0.0, 0.0 |
That’s it for the first line of pooling operations. The result is the first line of the average pooling operation:
1 |
[0.0, 3.0, 0.0] |
Given the (2,2) stride, the operation would then be moved down two rows and back to the first column and the process continued.
Because the downsampling operation halves each dimension, we will expect the output of pooling applied to the 6×6 feature map to be a new 3×3 feature map. Given the horizontal symmetry of the feature map input, we would expect each row to have the same average pooling values. Therefore, we would expect the resulting average pooling of the detected line feature map from the previous section to look as follows:
1 2 3 |
[0.0, 3.0, 0.0] [0.0, 3.0, 0.0] [0.0, 3.0, 0.0] |
We can confirm this by updating the example from the previous section to use average pooling.
This can be achieved in Keras by using the AveragePooling2D layer. The default pool_size (e.g. like the kernel size or filter size) of the layer is (2,2) and the default strides is None, which in this case means using the pool_size as the strides, which will be (2,2).
1 2 3 4 |
# create model model = Sequential() model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1))) model.add(AveragePooling2D()) |
The complete example with average pooling is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# example of average pooling from numpy import asarray from keras.models import Sequential from keras.layers import Conv2D from keras.layers import AveragePooling2D # define input data data = [[0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0]] data = asarray(data) data = data.reshape(1, 8, 8, 1) # create model model = Sequential() model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1))) model.add(AveragePooling2D()) # summarize model model.summary() # define a vertical line detector detector = [[[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]]] weights = [asarray(detector), asarray([0.0])] # store the weights in the model model.set_weights(weights) # apply filter to input data yhat = model.predict(data) # enumerate rows for r in range(yhat.shape[1]): # print each column in the row print([yhat[0,r,c,0] for c in range(yhat.shape[2])]) |
Running the example first summarizes the model.
We can see from the model summary that the input to the pooling layer will be a single feature map with the shape (6,6) and that the output of the average pooling layer will be a single feature map with each dimension halved, with the shape (3,3).
Applying the average pooling results in a new feature map that still detects the line, although in a down sampled manner, exactly as we expected from calculating the operation manually.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 6, 6, 1) 10 _________________________________________________________________ average_pooling2d_1 (Average (None, 3, 3, 1) 0 ================================================================= Total params: 10 Trainable params: 10 Non-trainable params: 0 _________________________________________________________________ [0.0, 3.0, 0.0] [0.0, 3.0, 0.0] [0.0, 3.0, 0.0] |
Average pooling works well, although it is more common to use max pooling.
Max Pooling Layer
Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.
The results are down sampled or pooled feature maps that highlight the most present feature in the patch, not the average presence of the feature in the case of average pooling. This has been found to work better in practice than average pooling for computer vision tasks like image classification.
In a nutshell, the reason is that features tend to encode the spatial presence of some pattern or concept over the different tiles of the feature map (hence, the term feature map), and it’s more informative to look at the maximal presence of different features than at their average presence.
— Page 129, Deep Learning with Python, 2017.
We can make the max pooling operation concrete by again applying it to the output feature map of the line detector convolutional operation and manually calculate the first row of the pooled feature map.
The first line for pooling (first two rows and six columns) of the output feature map were as follows:
1 2 |
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0] |
The first max pooling operation is applied as follows:
1 2 |
max(0.0, 0.0) = 0.0 0.0, 0.0 |
Given the stride of two, the operation is moved along two columns to the left and the max is calculated:
1 2 |
max(3.0, 3.0) = 3.0 3.0, 3.0 |
Again, the operation is moved along two columns to the left and the max is calculated:
1 2 |
max(0.0, 0.0) = 0.0 0.0, 0.0 |
That’s it for the first line of pooling operations.
The result is the first line of the max pooling operation:
1 |
[0.0, 3.0, 0.0] |
Again, given the horizontal symmetry of the feature map provided for pooling, we would expect the pooled feature map to look as follows:
1 2 3 |
[0.0, 3.0, 0.0] [0.0, 3.0, 0.0] [0.0, 3.0, 0.0] |
It just so happens that the chosen line detector image and feature map produce the same output when downsampled with average pooling and maximum pooling.
The maximum pooling operation can be added to the worked example by adding the MaxPooling2D layer provided by the Keras API.
1 2 3 4 |
# create model model = Sequential() model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1))) model.add(MaxPooling2D()) |
The complete example of vertical line detection with max pooling is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# example of max pooling from numpy import asarray from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D # define input data data = [[0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0]] data = asarray(data) data = data.reshape(1, 8, 8, 1) # create model model = Sequential() model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1))) model.add(MaxPooling2D()) # summarize model model.summary() # define a vertical line detector detector = [[[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]]] weights = [asarray(detector), asarray([0.0])] # store the weights in the model model.set_weights(weights) # apply filter to input data yhat = model.predict(data) # enumerate rows for r in range(yhat.shape[1]): # print each column in the row print([yhat[0,r,c,0] for c in range(yhat.shape[2])]) |
Running the example first summarizes the model.
We can see, as we might expect by now, that the output of the max pooling layer will be a single feature map with each dimension halved, with the shape (3,3).
Applying the max pooling results in a new feature map that still detects the line, although in a down sampled manner.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 6, 6, 1) 10 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 3, 3, 1) 0 ================================================================= Total params: 10 Trainable params: 10 Non-trainable params: 0 _________________________________________________________________ [0.0, 3.0, 0.0] [0.0, 3.0, 0.0] [0.0, 3.0, 0.0] |
Global Pooling Layers
There is another type of pooling that is sometimes used called global pooling.
Instead of down sampling patches of the input feature map, global pooling down samples the entire feature map to a single value. This would be the same as setting the pool_size to the size of the input feature map.
Global pooling can be used in a model to aggressively summarize the presence of a feature in an image. It is also sometimes used in models as an alternative to using a fully connected layer to transition from feature maps to an output prediction for the model.
Both global average pooling and global max pooling are supported by Keras via the GlobalAveragePooling2D and GlobalMaxPooling2D classes respectively.
For example, we can add global max pooling to the convolutional model used for vertical line detection.
1 2 3 4 |
# create model model = Sequential() model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1))) model.add(GlobalMaxPooling2D()) |
The outcome will be a single value that will summarize the strongest activation or presence of the vertical line in the input image.
The complete code listing is provided below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# example of using global max pooling from numpy import asarray from keras.models import Sequential from keras.layers import Conv2D from keras.layers import GlobalMaxPooling2D # define input data data = [[0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0]] data = asarray(data) data = data.reshape(1, 8, 8, 1) # create model model = Sequential() model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1))) model.add(GlobalMaxPooling2D()) # summarize model model.summary() # # define a vertical line detector detector = [[[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]], [[[0]],[[1]],[[0]]]] weights = [asarray(detector), asarray([0.0])] # store the weights in the model model.set_weights(weights) # apply filter to input data yhat = model.predict(data) # enumerate rows print(yhat) |
Running the example first summarizes the model
We can see that, as expected, the output of the global pooling layer is a single value that summarizes the presence of the feature in the single feature map.
Next, the output of the model is printed showing the effect of global max pooling on the feature map, printing the single largest activation.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 6, 6, 1) 10 _________________________________________________________________ global_max_pooling2d_1 (Glob (None, 1) 0 ================================================================= Total params: 10 Trainable params: 10 Non-trainable params: 0 _________________________________________________________________ [[3.]] |
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Posts
Books
- Chapter 9: Convolutional Networks, Deep Learning, 2016.
- Chapter 5: Deep Learning for Computer Vision, Deep Learning with Python, 2017.
API
Summary
In this tutorial, you discovered how the pooling operation works and how to implement it in convolutional neural networks.
Specifically, you learned:
- Pooling is required to down sample the detection of features in feature maps.
- How to calculate and implement average and maximum pooling in a convolutional neural network.
- How to use global pooling in a convolutional neural network.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Thanks. An interesting read.
Thanks.
I do not understand how global pooling works in coding results. please help
Which part don’t you understand exactly?
I’m focusing on results. how it gives us a single value?
Average pooling gives a single output because it calculates the average of the inputs.
What the algorithms we can use it in Convolutional layer?
You can discover how convolutional layers work in this tutorial:
https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/
hi ,How can you help me to understand the training phase in svm when i classification 2 class
Start here:
https://machinelearningmastery.com/support-vector-machines-for-machine-learning/
Excellent article, thank you so much for writing it. It could be helpful to create a slight variation of your examples where average and max pooling produce different results :).
Great suggestion, thanks Justin.
Case:1. if we apply average pooling then it will need to place all FC-layers and then softmax?
Case2: if we apply the average pooling then it will need to feed the resulting vector directly into softmax?
Case3: the sequence will look correct.. features maps – avr pooling – softmax? OR features map – avr pooling – FC-layers – Softmax?
Case3: can we say that the services of average pooling can be achieved through GAP?
Case4: in case of multi-CNN, how we will concatenate the features maps into the average pooling
Not sure I agree, they are all options, not requirements.
What are you getting at exactly?
I am asking for classification/recognition when multiple CNNs are used.
so, what will be the proper sequence to place all the operations what I mentioned above?
Because, it is mentioned in the GAP research article, that when it is used then no need
to use FC-layers. so what is the case in the average pool layer?
(1): if we want to use CNN for images (classification/recognition task), can we use
softmax classifier directly after the Average Pool Layer (skip the fully-connected layers)?
(2): OR for classification/recognition for any input image, can we place FC-Layers after
Average pool layer and Then Softmax?
And the last query, for image classification/recognition, what will be the right option when
multiple-CNN are used to extract the features from the images,
Option 1: Average pooling layer or GAP
Option2: Average pooling layer + Softmax?
Option3: Average pooling layer + FC-layers+ Softmax?
Option4: Features Maps + GAP?
Option5: Features Maps + GAP + FC-layers + Softmax?
Why I am asking in details because I read from multiple sources, but it was not quite clear that what exactly the proper procedure should be used, also, after reading I feel that average pooling and GAP can provide the same services.
There is no single best way. There are no rules and models differ, it is a good idea to experiment to see what works best for your specific dataset.
You can use use a softmax after global pooling or a dense layer, or just a dense layer and no global pooling, or many other combinations.
It might be a good idea to look at the architecture of some well performing models like vgg, resnet, inception and try their proposed architecture in your model to see how it compares. or to get ideas.
Great Article!!!
Thanks, I’m glad it helped!
can we use random forests for pooling
No.
Thank you for the clear definitions and nice examples.
A couple of questions about using global pooling at the end of a CNN model (before the fully connected as e.g. resnet):
What would you say are the advantages/disadvantages of using global avg pooling vs global max pooling as a final layer of the feature extraction (are there cases where max would be prefered)?
When switching between the two, how does it affect hyper parameters such as learning rate and weight regularization? (since max doesn’t pass gradients through all of the features, opposed to avg?)
You wrote: “Global pooling can be used in a model to aggressively summarize the presence of a feature in an image. It is also sometimes used in models **as an alternative** to using a fully connected layer to transition from feature maps to an output prediction for the model.”
Wouldn’t it be more accurate to say that (usually in the cnn domain) global pooling is sometimes added *before* (i.e. in addition) a fully connected (fc) layer in the transition from feature maps to an output prediction for the model (both giving the features global attention and reducing computation of the fc layer)?
In order for global pooling to replace the last fc layer, you would need to equalize the number of channels to the number of classes first (e.g. 1×1 conv?), this would be heavier (computationally-wise) and a somewhat different operation than adding a fc after the global pool (e.g. as it’s done in common cnn models with a final global pooling layer). Is this actually ever done this way?
Thanks.
You could probable construct post hoc arguments about the differences. I’d recommend testing them both and using results to guide you.
No, global pooling is used instead of a fully connected layer – they are used as output layers. Inspect some of the classical models to confirm.
It does, they output a vector.
Thanks, it is really nice explanation of pooling. Very readable and informative thanks to the examples.
Thanks, I’m happy it helped.
What does the below sentence about pooling layers mean?
“This means that small movements in the position of the feature in the input image will result in a different feature map.
“
It means that slightly different images that look the same to our eyes look very diffrent to the model.
Then how does it recognize an image as a dog that does have a dog in it but not in the center? This means those huge movements in the position of the dog’s feature in the input image will look very much different to the model.
Yes, a property of the CNN architecture is that it is invariant to the position of features in the input, e.g. if the model knows what a dog it, then the dog can appear almost anywhere in any position and still be detected correctly (within the limits).
Yes, I understand. My question is how a CNN is invariant to the position of features in the input? With the pooling layers, only the problem of a slight difference in the input can be solved (as you mentioned above). Then how this big difference in position (from the center to the corner) is solved?? Do we have any other type of layer to do this?
The conv and pooling layers when stacked achieve feature invariance together.
Perhaps I don’t understand your question.
Hello Jason, I am working on training convolutional neural network through transfer learning. I want to find the mean of the inter-class standard deviation for each convolutional layer to identify the best convolutional layer to freeze. Any help would be appreciated?
Not sure I follow, sorry. Perhaps post your question to stackoverflow?
Hi Jason
Great post!
One of the frequently asked questions is why do we need a pooling operation after convolution in a CNN. The fact that you highlighted, making the image detector translation-invariant, is a very important point.
Are there methods to make the detector rotation-invariant as well?
Thanks
Thanks.
Yes, train with rotated versions of the images. This is called data augmentation.
Hi,
Thanks for the amazing post.
I have one doubt. I am building my own CNN and i am using max pooling. I did understand the forward propagation from the explanation.
I was wondering about backward propagation, we save the index value of the maximum and insert ‘1’ for that index. But for the example you showed, it has all values as same. example ‘0’ in the first 2 x 2 cell. So do we insert ‘1’ for all the zeros here or any random ‘0’. Similarly if have 2 x 2 cell which has all the same value(0.9). So again do we insert ‘1’ for all the same value of ‘0.9’ or random.
Thanks again for the post
Thanks.
Sorry, I don’t quite follow your question. Perhaps you can rephrase it?
Hi,
Thanks for your reply.
Sorry for confusion. i was wondering about the backpropagation for the Max pooling example you mentioned.
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
the forward propagation for above matrix is,
[0.0, 3.0, 0.0]
So, is the derivative of the matrix(i.e ‘1’ to the largest value we picked during forward propagation)
But if all the values of the 2 x 2 matrix for pooling are same
Is it ‘1’ for any random value of ‘3.0’ i.e maximum
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
or
‘1’ for all the maximum values
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0]
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0]
***Also, i assume for all zeros the derivative is ‘0’(not sure)
Thanks again
Chinmay
Pooling layers do not have any weights, e.g. they are not involved in the learning.
Thank you for your reply. I was confused about the same as i read some CNN posts that we need to save the index numbers of the maximum values we choose after pooling. the post didn’t mentioned properly the use of saving the index values so i assumed they are used during back propagation.
Thank you for clearing the doubt
Interesting, but it would be simpler and more useful if you just used an eight by eight pixel image and showed the outputs. this is too abstract for concepts which are already abstract
Thanks for the suggestion.
Hello sir, the tutorial was amazing, but I had a doubt.
In the starting of the tutorial, you said “This means that small movements in the position of the feature in the input image will result in a different feature map”. Why do we even care if it’s a different feature map because it would still have all of it’s features in it as the previous time, but just at a different location now.
Eg: Imagine, we have a kernel that detects ‘lips’, we trained it on images of lips, where in all images, the lips were present in the center of the image. After training, we would have a kernel that could detect lips. Now if we show an image where lips is present at the top right, it would still do a good job because it is a kernel that detects lips.
So, even if the location of the features in the feature map changes, the CNN should still do a good job.
So, why do we care if it’s a different feature map, when it still contains all the same features, but at a different location?
Thanks!
Great question. We care because the model will extract different features – making the data inconsistent when in fact it is consistent. This makes learning harder and model performance worse.
Thank you for your reply.
By ‘different features’, do you mean that the model will extract different sets of features for an image that has been changed a little from the one with no change?
Yes, rotated versions of the same image might mean extracting different features.
If we use pooling we may achieve some rotation invariance in feature extraction.
ahh I see. Thank you. You really are a master of machine learning.
No, just a simple human.
Hello Jason! Thanks for all the tutorials you have done! I am new to Data Science and I am studying it on my own, so your posts have been really, really useful to me.
I have one question, though. Is there any situation you would not recommend using pooling layers in a CNN? Or are they one of those things that “it never hurts to have one”?
Not really. If you are unsure for your model, compare performance with and without the layers and use whatever results in the best performance.
Hi,
Trying to wrap my head around it and understand a bit more how ccn like yolo works, what I kind of get is the convolution part – in another words the detecting and categorisation, but i dont really get how such networks marks detected subjects by drawing border around them.
This probably is far more complicated 😉 but maybe you can push me in some direction.
Perhaps start here:
https://machinelearningmastery.com/object-recognition-with-deep-learning/
Sir, are your books written and updated in
tensorflow 2.0
….because I’m planning to buy Deep learning.Please reply sir.👍
Yes, this is a common question that I answer here:
https://machinelearningmastery.com/faq/single-faq/do-you-support-tensorflow-2
Hello Janson,
I wonder which one is better after some ResBlocks for a grayscale image, global sum pooling, or global max pooling?
tnx,
Hosein
Perhaps test and compare on your specific dataset.
i want to ask if the pooling don’t affect the back propagation (derivatives) calculations ?
pooling layers * i mean
Yes, it changes the structure of the data flow at that point of the network.
Thank you so much Jason. Your articles are really helpful to get clear intuition. Any time I need to understand a concept in machine learning, I search first in your articles.
Thanks.
In the ‘Detecting vertical lines’ code, data.reshape(1, 8, 8, 1) has 4 parameters. I understood (8, 8, 1) meant 8 pixels by 8 pixels and 1 channel, but what is the purpose of the other ‘1’.
P.s. please go easy on me I’m new to AI :).
It is 1 sample, e.g. one image. We often work with many images in a batch. It’s more efficient.
How to perform global sum pooling in pytorch (with and without the view() function). Thank you
Use average pooling instead. It is same as sum pooling with a constant scaling factor.
why max pooling is better??!
Not always better. It just experimentally found to be better in computer vision tasks.
Hello
Can you give me the MATLAB code for average pooling?
Hi H…I do not have tutorials in Octave or Matlab.
I believe Octave and Matlab are excellent platforms for learning how machine learning algorithms work in an academic setting.
I do not think that they are good platforms for applied machine learning in industry, which is the focus of my website.