A Gentle Introduction to Pooling Layers for Convolutional Neural Networks

By Jason Brownlee on July 5, 2019 in Deep Learning for Computer Vision 68

Convolutional layers in a convolutional neural network summarize the presence of features in an input image.

A problem with the output feature maps is that they are sensitive to the location of the features in the input. One approach to address this sensitivity is to down sample the feature maps. This has the effect of making the resulting down sampled feature maps more robust to changes in the position of the feature in the image, referred to by the technical phrase “local translation invariance.”

Pooling layers provide an approach to down sampling feature maps by summarizing the presence of features in patches of the feature map. Two common pooling methods are average pooling and max pooling that summarize the average presence of a feature and the most activated presence of a feature respectively.

In this tutorial, you will discover how the pooling operation works and how to implement it in convolutional neural networks.

After completing this tutorial, you will know:

Pooling is required to down sample the detection of features in feature maps.
How to calculate and implement average and maximum pooling in a convolutional neural network.
How to use global pooling in a convolutional neural network.

Kick-start your project with my new book Deep Learning for Computer Vision, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Gentle Introduction to Pooling Layers for Convolutional Neural Networks
Photo by Nicholas A. Tonelli, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

Pooling
Detecting Vertical Lines
Average Pooling Layers
Max Pooling Layers
Global Pooling Layers

Want Results with Deep Learning for Computer Vision?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Pooling Layers

Convolutional layers in a convolutional neural network systematically apply learned filters to input images in order to create feature maps that summarize the presence of those features in the input.

Convolutional layers prove very effective, and stacking convolutional layers in deep models allows layers close to the input to learn low-level features (e.g. lines) and layers deeper in the model to learn high-order or more abstract features, like shapes or specific objects.

A limitation of the feature map output of convolutional layers is that they record the precise position of features in the input. This means that small movements in the position of the feature in the input image will result in a different feature map. This can happen with re-cropping, rotation, shifting, and other minor changes to the input image.

A common approach to addressing this problem from signal processing is called down sampling. This is where a lower resolution version of an input signal is created that still contains the large or important structural elements, without the fine detail that may not be as useful to the task.

Down sampling can be achieved with convolutional layers by changing the stride of the convolution across the image. A more robust and common approach is to use a pooling layer.

A pooling layer is a new layer added after the convolutional layer. Specifically, after a nonlinearity (e.g. ReLU) has been applied to the feature maps output by a convolutional layer; for example the layers in a model may look as follows:

Input Image
Convolutional Layer
Nonlinearity
Pooling Layer

The addition of a pooling layer after the convolutional layer is a common pattern used for ordering layers within a convolutional neural network that may be repeated one or more times in a given model.

The pooling layer operates upon each feature map separately to create a new set of the same number of pooled feature maps.

Pooling involves selecting a pooling operation, much like a filter to be applied to feature maps. The size of the pooling operation or filter is smaller than the size of the feature map; specifically, it is almost always 2×2 pixels applied with a stride of 2 pixels.

This means that the pooling layer will always reduce the size of each feature map by a factor of 2, e.g. each dimension is halved, reducing the number of pixels or values in each feature map to one quarter the size. For example, a pooling layer applied to a feature map of 6×6 (36 pixels) will result in an output pooled feature map of 3×3 (9 pixels).

The pooling operation is specified, rather than learned. Two common functions used in the pooling operation are:

Average Pooling: Calculate the average value for each patch on the feature map.
Maximum Pooling (or Max Pooling): Calculate the maximum value for each patch of the feature map.

The result of using a pooling layer and creating down sampled or pooled feature maps is a summarized version of the features detected in the input. They are useful as small changes in the location of the feature in the input detected by the convolutional layer will result in a pooled feature map with the feature in the same location. This capability added by pooling is called the model’s invariance to local translation.

In all cases, pooling helps to make the representation become approximately invariant to small translations of the input. Invariance to translation means that if we translate the input by a small amount, the values of most of the pooled outputs do not change.

— Page 342, Deep Learning, 2016.

Now that we are familiar with the need and benefit of pooling layers, let’s look at some specific examples.

Detecting Vertical Lines

Before we look at some examples of pooling layers and their effects, let’s develop a small example of an input image and convolutional layer to which we can later add and evaluate pooling layers.

In this example, we define a single input image or sample that has one channel and is an 8 pixel by 8 pixel square with all 0 values and a two-pixel wide vertical line in the center.

# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)

# define input data

data = [[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0]]

data = asarray(data)

data = data.reshape(1, 8, 8, 1)

Next, we can define a model that expects input samples to have the shape (8, 8, 1) and has a single hidden convolutional layer with a single filter with the shape of 3 pixels by 3 pixels.

A rectified linear activation function, or ReLU for short, is then applied to each value in the feature map. This is a simple and effective nonlinearity, that in this case will not change the values in the feature map, but is present because we will later add subsequent pooling layers and pooling is added after the nonlinearity applied to the feature maps, e.g. a best practice.

# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
# summarize model
model.summary()

# create model

model = Sequential()

model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))

# summarize model

model.summary()

The filter is initialized with random weights as part of the initialization of the model.

Instead, we will hard code our own 3×3 filter that will detect vertical lines. That is the filter will strongly activate when it detects a vertical line and weakly activate when it does not. We expect that by applying this filter across the input image that the output feature map will show that the vertical line was detected.

# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)

# define a vertical line detector

detector = [[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]]]

weights = [asarray(detector), asarray([0.0])]

# store the weights in the model

model.set_weights(weights)

Next, we can apply the filter to our input image by calling the predict() function on the model.

# apply filter to input data
yhat = model.predict(data)

1 2	# apply filter to input data yhat = model.predict(data)

The result is a four-dimensional output with one batch, a given number of rows and columns, and one filter, or [batch, rows, columns, filters]. We can print the activations in the single feature map to confirm that the line was detected.

# enumerate rows
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

# enumerate rows

for r in range(yhat.shape[1]):

# print each column in the row

print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Tying all of this together, the complete example is listed below.

# example of vertical line detection with a convolutional layer
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
# summarize model
model.summary()
# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# apply filter to input data
yhat = model.predict(data)
# enumerate rows
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

# example of vertical line detection with a convolutional layer

from numpy import asarray

from keras.models import Sequential

from keras.layers import Conv2D

# define input data

data = [[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0]]

data = asarray(data)

data = data.reshape(1, 8, 8, 1)

# create model

model = Sequential()

model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))

# summarize model

model.summary()

# define a vertical line detector

detector = [[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]]]

weights = [asarray(detector), asarray([0.0])]

# store the weights in the model

model.set_weights(weights)

# apply filter to input data

yhat = model.predict(data)

# enumerate rows

for r in range(yhat.shape[1]):

# print each column in the row

print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Running the example first summarizes the structure of the model.

Of note is that the single hidden convolutional layer will take the 8×8 pixel input image and will produce a feature map with the dimensions of 6×6.

We can also see that the layer has 10 parameters: that is nine weights for the filter (3×3) and one weight for the bias.

Finally, the single feature map is printed.

We can see from reviewing the numbers in the 6×6 matrix that indeed the manually specified filter detected the vertical line in the middle of our input image.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 6, 6, 1)           10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________

[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_1 (Conv2D) (None, 6, 6, 1) 10

=================================================================

Total params: 10

Trainable params: 10

Non-trainable params: 0

_________________________________________________________________

[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

We can now look at some common approaches to pooling and how they impact the output feature maps.

Average Pooling Layer

On two-dimensional feature maps, pooling is typically applied in 2×2 patches of the feature map with a stride of (2,2).

Average pooling involves calculating the average for each patch of the feature map. This means that each 2×2 square of the feature map is down sampled to the average value in the square.

For example, the output of the line detector convolutional filter in the previous section was a 6×6 feature map. We can look at applying the average pooling operation to the first line of that feature map manually.

The first line for pooling (first two rows and six columns) of the output feature map were as follows:

[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

1 2	[0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

The first pooling operation is applied as follows:

average(0.0, 0.0) = 0.0
        0.0, 0.0

1 2	average(0.0, 0.0) = 0.0 0.0, 0.0

Given the stride of two, the operation is moved along two columns to the left and the average is calculated:

average(3.0, 3.0) = 3.0
        3.0, 3.0

1 2	average(3.0, 3.0) = 3.0 3.0, 3.0

Again, the operation is moved along two columns to the left and the average is calculated:

average(0.0, 0.0) = 0.0
        0.0, 0.0

1 2	average(0.0, 0.0) = 0.0 0.0, 0.0

That’s it for the first line of pooling operations. The result is the first line of the average pooling operation:

[0.0, 3.0, 0.0]

1	[0.0, 3.0, 0.0]

Given the (2,2) stride, the operation would then be moved down two rows and back to the first column and the process continued.

Because the downsampling operation halves each dimension, we will expect the output of pooling applied to the 6×6 feature map to be a new 3×3 feature map. Given the horizontal symmetry of the feature map input, we would expect each row to have the same average pooling values. Therefore, we would expect the resulting average pooling of the detected line feature map from the previous section to look as follows:

[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]

[0.0, 3.0, 0.0]

We can confirm this by updating the example from the previous section to use average pooling.

This can be achieved in Keras by using the AveragePooling2D layer. The default pool_size (e.g. like the kernel size or filter size) of the layer is (2,2) and the default strides is None, which in this case means using the pool_size as the strides, which will be (2,2).

# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
model.add(AveragePooling2D())

# create model

model = Sequential()

model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))

model.add(AveragePooling2D())

The complete example with average pooling is listed below.

# example of average pooling
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import AveragePooling2D
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
model.add(AveragePooling2D())
# summarize model
model.summary()
# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# apply filter to input data
yhat = model.predict(data)
# enumerate rows
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

# example of average pooling

from numpy import asarray

from keras.models import Sequential

from keras.layers import Conv2D

from keras.layers import AveragePooling2D

# define input data

data = [[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0]]

data = asarray(data)

data = data.reshape(1, 8, 8, 1)

# create model

model = Sequential()

model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))

model.add(AveragePooling2D())

# summarize model

model.summary()

# define a vertical line detector

detector = [[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]]]

weights = [asarray(detector), asarray([0.0])]

# store the weights in the model

model.set_weights(weights)

# apply filter to input data

yhat = model.predict(data)

# enumerate rows

for r in range(yhat.shape[1]):

# print each column in the row

print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Running the example first summarizes the model.

We can see from the model summary that the input to the pooling layer will be a single feature map with the shape (6,6) and that the output of the average pooling layer will be a single feature map with each dimension halved, with the shape (3,3).

Applying the average pooling results in a new feature map that still detects the line, although in a down sampled manner, exactly as we expected from calculating the operation manually.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 6, 6, 1)           10
_________________________________________________________________
average_pooling2d_1 (Average (None, 3, 3, 1)           0
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________

[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_1 (Conv2D) (None, 6, 6, 1) 10

_________________________________________________________________

average_pooling2d_1 (Average (None, 3, 3, 1) 0

=================================================================

Total params: 10

Trainable params: 10

Non-trainable params: 0

_________________________________________________________________

[0.0, 3.0, 0.0]

Average pooling works well, although it is more common to use max pooling.

Max Pooling Layer

Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.

The results are down sampled or pooled feature maps that highlight the most present feature in the patch, not the average presence of the feature in the case of average pooling. This has been found to work better in practice than average pooling for computer vision tasks like image classification.

In a nutshell, the reason is that features tend to encode the spatial presence of some pattern or concept over the different tiles of the feature map (hence, the term feature map), and it’s more informative to look at the maximal presence of different features than at their average presence.

— Page 129, Deep Learning with Python, 2017.

We can make the max pooling operation concrete by again applying it to the output feature map of the line detector convolutional operation and manually calculate the first row of the pooled feature map.

The first line for pooling (first two rows and six columns) of the output feature map were as follows:

[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

1 2	[0.0, 0.0, 3.0, 3.0, 0.0, 0.0] [0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

The first max pooling operation is applied as follows:

max(0.0, 0.0) = 0.0
    0.0, 0.0

1 2	max(0.0, 0.0) = 0.0 0.0, 0.0

Given the stride of two, the operation is moved along two columns to the left and the max is calculated:

max(3.0, 3.0) = 3.0
    3.0, 3.0

1 2	max(3.0, 3.0) = 3.0 3.0, 3.0

Again, the operation is moved along two columns to the left and the max is calculated:

max(0.0, 0.0) = 0.0
    0.0, 0.0

1 2	max(0.0, 0.0) = 0.0 0.0, 0.0

That’s it for the first line of pooling operations.

The result is the first line of the max pooling operation:

[0.0, 3.0, 0.0]

1	[0.0, 3.0, 0.0]

Again, given the horizontal symmetry of the feature map provided for pooling, we would expect the pooled feature map to look as follows:

[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]

[0.0, 3.0, 0.0]

It just so happens that the chosen line detector image and feature map produce the same output when downsampled with average pooling and maximum pooling.

The maximum pooling operation can be added to the worked example by adding the MaxPooling2D layer provided by the Keras API.

# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
model.add(MaxPooling2D())

# create model

model = Sequential()

model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))

model.add(MaxPooling2D())

The complete example of vertical line detection with max pooling is listed below.

# example of max pooling
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
model.add(MaxPooling2D())
# summarize model
model.summary()
# define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# apply filter to input data
yhat = model.predict(data)
# enumerate rows
for r in range(yhat.shape[1]):
	# print each column in the row
	print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

# example of max pooling

from numpy import asarray

from keras.models import Sequential

from keras.layers import Conv2D

from keras.layers import MaxPooling2D

# define input data

data = [[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0]]

data = asarray(data)

data = data.reshape(1, 8, 8, 1)

# create model

model = Sequential()

model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))

model.add(MaxPooling2D())

# summarize model

model.summary()

# define a vertical line detector

detector = [[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]]]

weights = [asarray(detector), asarray([0.0])]

# store the weights in the model

model.set_weights(weights)

# apply filter to input data

yhat = model.predict(data)

# enumerate rows

for r in range(yhat.shape[1]):

# print each column in the row

print([yhat[0,r,c,0] for c in range(yhat.shape[2])])

Running the example first summarizes the model.

We can see, as we might expect by now, that the output of the max pooling layer will be a single feature map with each dimension halved, with the shape (3,3).

Applying the max pooling results in a new feature map that still detects the line, although in a down sampled manner.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 6, 6, 1)           10
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 3, 3, 1)           0
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________

[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]
[0.0, 3.0, 0.0]

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_1 (Conv2D) (None, 6, 6, 1) 10

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 (None, 3, 3, 1) 0

=================================================================

Total params: 10

Trainable params: 10

Non-trainable params: 0

_________________________________________________________________

[0.0, 3.0, 0.0]

Global Pooling Layers

There is another type of pooling that is sometimes used called global pooling.

Instead of down sampling patches of the input feature map, global pooling down samples the entire feature map to a single value. This would be the same as setting the pool_size to the size of the input feature map.

Global pooling can be used in a model to aggressively summarize the presence of a feature in an image. It is also sometimes used in models as an alternative to using a fully connected layer to transition from feature maps to an output prediction for the model.

Both global average pooling and global max pooling are supported by Keras via the GlobalAveragePooling2D and GlobalMaxPooling2D classes respectively.

For example, we can add global max pooling to the convolutional model used for vertical line detection.

# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
model.add(GlobalMaxPooling2D())

# create model

model = Sequential()

model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))

model.add(GlobalMaxPooling2D())

The outcome will be a single value that will summarize the strongest activation or presence of the vertical line in the input image.

The complete code listing is provided below.

# example of using global max pooling
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import GlobalMaxPooling2D
# define input data
data = [[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0],
		[0, 0, 0, 1, 1, 0, 0, 0]]
data = asarray(data)
data = data.reshape(1, 8, 8, 1)
# create model
model = Sequential()
model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))
model.add(GlobalMaxPooling2D())
# summarize model
model.summary()
# # define a vertical line detector
detector = [[[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]],
            [[[0]],[[1]],[[0]]]]
weights = [asarray(detector), asarray([0.0])]
# store the weights in the model
model.set_weights(weights)
# apply filter to input data
yhat = model.predict(data)
# enumerate rows
print(yhat)

# example of using global max pooling

from numpy import asarray

from keras.models import Sequential

from keras.layers import Conv2D

from keras.layers import GlobalMaxPooling2D

# define input data

data = [[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0],

[0, 0, 0, 1, 1, 0, 0, 0]]

data = asarray(data)

data = data.reshape(1, 8, 8, 1)

# create model

model = Sequential()

model.add(Conv2D(1, (3,3), activation='relu', input_shape=(8, 8, 1)))

model.add(GlobalMaxPooling2D())

# summarize model

model.summary()

# # define a vertical line detector

detector = [[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]],

[[[0]],[[1]],[[0]]]]

weights = [asarray(detector), asarray([0.0])]

# store the weights in the model

model.set_weights(weights)

# apply filter to input data

yhat = model.predict(data)

# enumerate rows

print(yhat)

Running the example first summarizes the model

We can see that, as expected, the output of the global pooling layer is a single value that summarizes the presence of the feature in the single feature map.

Next, the output of the model is printed showing the effect of global max pooling on the feature map, printing the single largest activation.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 6, 6, 1)           10
_________________________________________________________________
global_max_pooling2d_1 (Glob (None, 1)                 0
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________

[[3.]]

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_1 (Conv2D) (None, 6, 6, 1) 10

_________________________________________________________________

global_max_pooling2d_1 (Glob (None, 1) 0

=================================================================

Total params: 10

Trainable params: 10

Non-trainable params: 0

_________________________________________________________________

[[3.]]

Summary

In this tutorial, you discovered how the pooling operation works and how to implement it in convolutional neural networks.

Specifically, you learned:

Pooling is required to down sample the detection of features in feature maps.
How to calculate and implement average and maximum pooling in a convolutional neural network.
How to use global pooling in a convolutional neural network.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

68 Responses to A Gentle Introduction to Pooling Layers for Convolutional Neural Networks

Bejoscha April 26, 2019 at 7:27 am #

Thanks. An interesting read.

Reply
- Jason Brownlee April 26, 2019 at 8:40 am #
  
  Thanks.
  
  Reply
jamila May 9, 2019 at 10:39 pm #

I do not understand how global pooling works in coding results. please help

Reply
- Jason Brownlee May 10, 2019 at 8:17 am #
  
  Which part don’t you understand exactly?
  
  Reply
jamila May 10, 2019 at 5:34 pm #

I’m focusing on results. how it gives us a single value?

Reply
- Jason Brownlee May 11, 2019 at 6:06 am #
  
  Average pooling gives a single output because it calculates the average of the inputs.
  
  Reply
- Ohood fadil September 25, 2019 at 3:34 pm #
  
  What the algorithms we can use it in Convolutional layer?
  
  Reply
  - Jason Brownlee September 26, 2019 at 6:30 am #
    
    You can discover how convolutional layers work in this tutorial:
    https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/
    
    Reply
    - ohood fadil April 26, 2020 at 6:45 am #
      
      hi ,How can you help me to understand the training phase in svm when i classification 2 class
      
      Reply
      - Jason Brownlee April 27, 2020 at 5:20 am #
        
        Start here:
        https://machinelearningmastery.com/support-vector-machines-for-machine-learning/
Justin June 14, 2019 at 11:09 pm #

Excellent article, thank you so much for writing it. It could be helpful to create a slight variation of your examples where average and max pooling produce different results :).

Reply
- Jason Brownlee June 15, 2019 at 6:35 am #
  
  Great suggestion, thanks Justin.
  
  Reply
LELA June 21, 2019 at 2:31 am #

Case:1. if we apply average pooling then it will need to place all FC-layers and then softmax?
Case2: if we apply the average pooling then it will need to feed the resulting vector directly into softmax?

Case3: the sequence will look correct.. features maps – avr pooling – softmax? OR features map – avr pooling – FC-layers – Softmax?

Case3: can we say that the services of average pooling can be achieved through GAP?

Case4: in case of multi-CNN, how we will concatenate the features maps into the average pooling

Reply
- Jason Brownlee June 21, 2019 at 6:40 am #
  
  Not sure I agree, they are all options, not requirements.
  
  What are you getting at exactly?
  
  Reply
  - LELA June 21, 2019 at 2:44 pm #
    
    I am asking for classification/recognition when multiple CNNs are used.
    
    so, what will be the proper sequence to place all the operations what I mentioned above?
    
    Because, it is mentioned in the GAP research article, that when it is used then no need
    
    to use FC-layers. so what is the case in the average pool layer?
    
    (1): if we want to use CNN for images (classification/recognition task), can we use
    
    softmax classifier directly after the Average Pool Layer (skip the fully-connected layers)?
    
    (2): OR for classification/recognition for any input image, can we place FC-Layers after
    
    Average pool layer and Then Softmax?
    
    And the last query, for image classification/recognition, what will be the right option when
    
    multiple-CNN are used to extract the features from the images,
    
    Option 1: Average pooling layer or GAP
    Option2: Average pooling layer + Softmax?
    Option3: Average pooling layer + FC-layers+ Softmax?
    Option4: Features Maps + GAP?
    Option5: Features Maps + GAP + FC-layers + Softmax?
    
    Why I am asking in details because I read from multiple sources, but it was not quite clear that what exactly the proper procedure should be used, also, after reading I feel that average pooling and GAP can provide the same services.
    
    Reply
    - Jason Brownlee June 22, 2019 at 6:28 am #
      
      There is no single best way. There are no rules and models differ, it is a good idea to experiment to see what works best for your specific dataset.
      
      You can use use a softmax after global pooling or a dense layer, or just a dense layer and no global pooling, or many other combinations.
      
      It might be a good idea to look at the architecture of some well performing models like vgg, resnet, inception and try their proposed architecture in your model to see how it compares. or to get ideas.
      
      Reply
Rango September 1, 2019 at 1:35 pm #

Great Article!!!

Reply
- Jason Brownlee September 2, 2019 at 5:25 am #
  
  Thanks, I’m glad it helped!
  
  Reply
JustVenky September 20, 2019 at 3:22 pm #

can we use random forests for pooling

Reply
- Jason Brownlee September 21, 2019 at 6:44 am #
  
  No.
  
  Reply
RoyHJ November 2, 2019 at 11:08 pm #

Thank you for the clear definitions and nice examples.

A couple of questions about using global pooling at the end of a CNN model (before the fully connected as e.g. resnet):

What would you say are the advantages/disadvantages of using global avg pooling vs global max pooling as a final layer of the feature extraction (are there cases where max would be prefered)?

When switching between the two, how does it affect hyper parameters such as learning rate and weight regularization? (since max doesn’t pass gradients through all of the features, opposed to avg?)

You wrote: “Global pooling can be used in a model to aggressively summarize the presence of a feature in an image. It is also sometimes used in models **as an alternative** to using a fully connected layer to transition from feature maps to an output prediction for the model.”

Wouldn’t it be more accurate to say that (usually in the cnn domain) global pooling is sometimes added *before* (i.e. in addition) a fully connected (fc) layer in the transition from feature maps to an output prediction for the model (both giving the features global attention and reducing computation of the fc layer)?

In order for global pooling to replace the last fc layer, you would need to equalize the number of channels to the number of classes first (e.g. 1×1 conv?), this would be heavier (computationally-wise) and a somewhat different operation than adding a fc after the global pool (e.g. as it’s done in common cnn models with a final global pooling layer). Is this actually ever done this way?

Reply
- Jason Brownlee November 3, 2019 at 5:58 am #
  
  Thanks.
  
  You could probable construct post hoc arguments about the differences. I’d recommend testing them both and using results to guide you.
  
  No, global pooling is used instead of a fully connected layer – they are used as output layers. Inspect some of the classical models to confirm.
  
  It does, they output a vector.
  
  Reply
AH December 10, 2019 at 11:55 pm #

Thanks, it is really nice explanation of pooling. Very readable and informative thanks to the examples.

Reply
- Jason Brownlee December 11, 2019 at 6:59 am #
  
  Thanks, I’m happy it helped.
  
  Reply
Vinay January 26, 2020 at 12:47 am #

What does the below sentence about pooling layers mean?

“This means that small movements in the position of the feature in the input image will result in a different feature map.
“

Reply
- Jason Brownlee January 26, 2020 at 5:19 am #
  
  It means that slightly different images that look the same to our eyes look very diffrent to the model.
  
  Reply
  - Amatul Saboor July 20, 2020 at 4:37 pm #
    
    Then how does it recognize an image as a dog that does have a dog in it but not in the center? This means those huge movements in the position of the dog’s feature in the input image will look very much different to the model.
    
    Reply
    - Jason Brownlee July 21, 2020 at 5:53 am #
      
      Yes, a property of the CNN architecture is that it is invariant to the position of features in the input, e.g. if the model knows what a dog it, then the dog can appear almost anywhere in any position and still be detected correctly (within the limits).
      
      Reply
      - Amatul Saboor July 22, 2020 at 4:30 am #
        
        Yes, I understand. My question is how a CNN is invariant to the position of features in the input? With the pooling layers, only the problem of a slight difference in the input can be solved (as you mentioned above). Then how this big difference in position (from the center to the corner) is solved?? Do we have any other type of layer to do this?
      - Jason Brownlee July 22, 2020 at 5:45 am #
        
        The conv and pooling layers when stacked achieve feature invariance together.
        
        Perhaps I don’t understand your question.
Shailu February 21, 2020 at 7:17 am #

Hello Jason, I am working on training convolutional neural network through transfer learning. I want to find the mean of the inter-class standard deviation for each convolutional layer to identify the best convolutional layer to freeze. Any help would be appreciated?

Reply
- Jason Brownlee February 21, 2020 at 8:32 am #
  
  Not sure I follow, sorry. Perhaps post your question to stackoverflow?
  
  Reply
Sagnik February 26, 2020 at 8:10 pm #

Hi Jason
Great post!
One of the frequently asked questions is why do we need a pooling operation after convolution in a CNN. The fact that you highlighted, making the image detector translation-invariant, is a very important point.
Are there methods to make the detector rotation-invariant as well?
Thanks

Reply
- Jason Brownlee February 27, 2020 at 5:43 am #
  
  Thanks.
  
  Yes, train with rotated versions of the images. This is called data augmentation.
  
  Reply
Chinmay March 7, 2020 at 6:28 pm #

Hi,

Thanks for the amazing post.

I have one doubt. I am building my own CNN and i am using max pooling. I did understand the forward propagation from the explanation.

I was wondering about backward propagation, we save the index value of the maximum and insert ‘1’ for that index. But for the example you showed, it has all values as same. example ‘0’ in the first 2 x 2 cell. So do we insert ‘1’ for all the zeros here or any random ‘0’. Similarly if have 2 x 2 cell which has all the same value(0.9). So again do we insert ‘1’ for all the same value of ‘0.9’ or random.

Thanks again for the post

Reply
- Jason Brownlee March 8, 2020 at 6:08 am #
  
  Thanks.
  
  Sorry, I don’t quite follow your question. Perhaps you can rephrase it?
  
  Reply
Chinmay Appa Rane March 10, 2020 at 10:52 am #

Hi,

Thanks for your reply.

Sorry for confusion. i was wondering about the backpropagation for the Max pooling example you mentioned.

[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]
[0.0, 0.0, 3.0, 3.0, 0.0, 0.0]

the forward propagation for above matrix is,

[0.0, 3.0, 0.0]

So, is the derivative of the matrix(i.e ‘1’ to the largest value we picked during forward propagation)

But if all the values of the 2 x 2 matrix for pooling are same

Is it ‘1’ for any random value of ‘3.0’ i.e maximum
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

or

‘1’ for all the maximum values
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0]
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0]

***Also, i assume for all zeros the derivative is ‘0’(not sure)

Thanks again

Chinmay

Reply
- Jason Brownlee March 10, 2020 at 1:39 pm #
  
  Pooling layers do not have any weights, e.g. they are not involved in the learning.
  
  Reply
  - Chinmay Appa Rane March 11, 2020 at 4:42 pm #
    
    Thank you for your reply. I was confused about the same as i read some CNN posts that we need to save the index numbers of the maximum values we choose after pooling. the post didn’t mentioned properly the use of saving the index values so i assumed they are used during back propagation.
    
    Thank you for clearing the doubt
    
    Reply
Blane May 14, 2020 at 12:15 am #

Interesting, but it would be simpler and more useful if you just used an eight by eight pixel image and showed the outputs. this is too abstract for concepts which are already abstract

Reply
- Jason Brownlee May 14, 2020 at 5:51 am #
  
  Thanks for the suggestion.
  
  Reply
Dhruv September 11, 2020 at 4:50 am #

Hello sir, the tutorial was amazing, but I had a doubt.

In the starting of the tutorial, you said “This means that small movements in the position of the feature in the input image will result in a different feature map”. Why do we even care if it’s a different feature map because it would still have all of it’s features in it as the previous time, but just at a different location now.

Eg: Imagine, we have a kernel that detects ‘lips’, we trained it on images of lips, where in all images, the lips were present in the center of the image. After training, we would have a kernel that could detect lips. Now if we show an image where lips is present at the top right, it would still do a good job because it is a kernel that detects lips.

So, even if the location of the features in the feature map changes, the CNN should still do a good job.

So, why do we care if it’s a different feature map, when it still contains all the same features, but at a different location?

Reply
- Jason Brownlee September 11, 2020 at 6:03 am #
  
  Thanks!
  
  Great question. We care because the model will extract different features – making the data inconsistent when in fact it is consistent. This makes learning harder and model performance worse.
  
  Reply
  - Dhruv September 11, 2020 at 11:49 pm #
    
    Thank you for your reply.
    
    By ‘different features’, do you mean that the model will extract different sets of features for an image that has been changed a little from the one with no change?
    
    Reply
    - Jason Brownlee September 12, 2020 at 6:16 am #
      
      Yes, rotated versions of the same image might mean extracting different features.
      
      If we use pooling we may achieve some rotation invariance in feature extraction.
      
      Reply
Dhruv September 12, 2020 at 1:54 pm #

ahh I see. Thank you. You really are a master of machine learning.

Reply
- Jason Brownlee September 13, 2020 at 5:58 am #
  
  No, just a simple human.
  
  Reply
Gabriel October 1, 2020 at 3:48 am #

Hello Jason! Thanks for all the tutorials you have done! I am new to Data Science and I am studying it on my own, so your posts have been really, really useful to me.

I have one question, though. Is there any situation you would not recommend using pooling layers in a CNN? Or are they one of those things that “it never hurts to have one”?

Reply
- Jason Brownlee October 1, 2020 at 6:32 am #
  
  Not really. If you are unsure for your model, compare performance with and without the layers and use whatever results in the best performance.
  
  Reply
Paweł December 22, 2020 at 5:37 am #

Hi,
Trying to wrap my head around it and understand a bit more how ccn like yolo works, what I kind of get is the convolution part – in another words the detecting and categorisation, but i dont really get how such networks marks detected subjects by drawing border around them.

This probably is far more complicated 😉 but maybe you can push me in some direction.

Reply
- Jason Brownlee December 22, 2020 at 6:52 am #
  
  Perhaps start here:
  https://machinelearningmastery.com/object-recognition-with-deep-learning/
  
  Reply
Anuj February 22, 2021 at 8:37 am #

Sir, are your books written and updated in
tensorflow 2.0
….because I’m planning to buy Deep learning.Please reply sir.????

Reply
- Jason Brownlee February 22, 2021 at 8:53 am #
  
  Yes, this is a common question that I answer here:
  https://machinelearningmastery.com/faq/single-faq/do-you-support-tensorflow-2
  
  Reply
Hosein February 22, 2021 at 6:57 pm #

Hello Janson,

I wonder which one is better after some ResBlocks for a grayscale image, global sum pooling, or global max pooling?

tnx,
Hosein

Reply
- Jason Brownlee February 23, 2021 at 6:18 am #
  
  Perhaps test and compare on your specific dataset.
  
  Reply
shimaa February 25, 2021 at 9:15 pm #

i want to ask if the pooling don’t affect the back propagation (derivatives) calculations ?

Reply
- shimaa February 25, 2021 at 9:19 pm #
  
  pooling layers * i mean
  
  Reply
- Jason Brownlee February 26, 2021 at 4:58 am #
  
  Yes, it changes the structure of the data flow at that point of the network.
  
  Reply
Hind Almisbahi April 22, 2021 at 12:12 am #

Thank you so much Jason. Your articles are really helpful to get clear intuition. Any time I need to understand a concept in machine learning, I search first in your articles.

Reply
- Jason Brownlee April 22, 2021 at 5:39 am #
  
  Thanks.
  
  Reply
Yash raj July 7, 2021 at 4:08 am #

In the ‘Detecting vertical lines’ code, data.reshape(1, 8, 8, 1) has 4 parameters. I understood (8, 8, 1) meant 8 pixels by 8 pixels and 1 channel, but what is the purpose of the other ‘1’.
P.s. please go easy on me I’m new to AI :).

Reply
- Jason Brownlee July 7, 2021 at 5:35 am #
  
  It is 1 sample, e.g. one image. We often work with many images in a batch. It’s more efficient.
  
  Reply
Najeh November 26, 2021 at 6:05 am #

How to perform global sum pooling in pytorch (with and without the view() function). Thank you

Reply
- Adrian Tam November 29, 2021 at 8:28 am #
  
  Use average pooling instead. It is same as sum pooling with a constant scaling factor.
  
  Reply
Erfan December 15, 2021 at 4:07 am #

why max pooling is better??!

Reply
- Adrian Tam December 15, 2021 at 7:24 am #
  
  Not always better. It just experimentally found to be better in computer vision tasks.
  
  Reply
H March 6, 2022 at 7:57 am #

Hello
Can you give me the MATLAB code for average pooling?

Reply
- James Carmichael March 6, 2022 at 1:03 pm #
  
  Hi H…I do not have tutorials in Octave or Matlab.
  
  I believe Octave and Matlab are excellent platforms for learning how machine learning algorithms work in an academic setting.
  
  I do not think that they are good platforms for applied machine learning in industry, which is the focus of my website.
  
  Reply

Navigation

A Gentle Introduction to Pooling Layers for Convolutional Neural Networks

Tutorial Overview

Want Results with Deep Learning for Computer Vision?

Pooling Layers

Detecting Vertical Lines

Average Pooling Layer

Max Pooling Layer

Global Pooling Layers

Further Reading

Posts

Books

API

Summary

Develop Deep Learning Models for Vision Today!

Develop Your Own Vision Models in Minutes

Finally Bring Deep Learning to your Vision Projects

More On This Topic

68 Responses to A Gentle Introduction to Pooling Layers for Convolutional Neural Networks

Leave a Reply Click here to cancel reply.