How to use the UpSampling2D and Conv2DTranspose Layers in Keras

By Jason Brownlee on July 12, 2019 in Generative Adversarial Networks 43

Generative Adversarial Networks, or GANs, are an architecture for training generative models, such as deep convolutional neural networks for generating images.

The GAN architecture is comprised of both a generator and a discriminator model. The generator is responsible for creating new outputs, such as images, that plausibly could have come from the original dataset. The generator model is typically implemented using a deep convolutional neural network and results-specialized layers that learn to fill in features in an image rather than extract features from an input image.

Two common types of layers that can be used in the generator model are a upsample layer (UpSampling2D) that simply doubles the dimensions of the input and the transpose convolutional layer (Conv2DTranspose) that performs an inverse convolution operation.

In this tutorial, you will discover how to use UpSampling2D and Conv2DTranspose Layers in Generative Adversarial Networks when generating images.

After completing this tutorial, you will know:

Generative models in the GAN architecture are required to upsample input data in order to generate an output image.
The Upsampling layer is a simple layer with no weights that will double the dimensions of input and can be used in a generative model when followed by a traditional convolutional layer.
The Transpose Convolutional layer is an inverse convolutional layer that will both upsample input and learn how to fill in details during the model training process.

Kick-start your project with my new book Generative Adversarial Networks with Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Gentle Introduction to Upsampling and Transpose Convolution Layers for Generative Adversarial Networks
Photo by BLM Nevada, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

Need for Upsampling in GANs
How to Use the Upsampling Layer
How to Use the Transpose Convolutional Layer

Need for Upsampling in Generative Adversarial Networks

Generative Adversarial Networks are an architecture for neural networks for training a generative model.

The architecture is comprised of a generator and a discriminator model, both of which are implemented as a deep convolutional neural network. The discriminator is responsible for classifying images as either real (from the domain) or fake (generated). The generator is responsible for generating new plausible examples from the problem domain.

The generator works by taking a random point from the latent space as input and outputting a complete image, in a one-shot manner.

A traditional convolutional neural network for image classification, and related tasks, will use pooling layers to downsample input images. For example, an average pooling or max pooling layer will reduce the feature maps from a convolutional by half on each dimension, resulting in an output that is one quarter the area of the input.

Convolutional layers themselves also perform a form of downsampling by applying each filter across the input images or feature maps; the resulting activations are an output feature map that is smaller because of the border effects. Often padding is used to counter this effect.

The generator model in a GAN requires an inverse operation of a pooling layer in a traditional convolutional layer. It needs a layer to translate from coarse salient features to a more dense and detailed output.

A simple version of an unpooling or opposite pooling layer is called an upsampling layer. It works by repeating the rows and columns of the input.

A more elaborate approach is to perform a backwards convolutional operation, originally referred to as a deconvolution, which is incorrect, but is more commonly referred to as a fractional convolutional layer or a transposed convolutional layer.

Both of these layers can be used on a GAN to perform the required upsampling operation to transform a small input into a large image output.

In the following sections, we will take a closer look at each and develop an intuition for how they work so that we can use them effectively in our GAN models.

How to Use the UpSampling2D Layer

Perhaps the simplest way to upsample an input is to double each row and column.

For example, an input image with the shape 2×2 would be output as 4×4.

         1, 2
Input = (3, 4)

          1, 1, 2, 2
Output = (1, 1, 2, 2)
          3, 3, 4, 4
          3, 3, 4, 4

1, 2

Input = (3, 4)

1, 1, 2, 2

Output = (1, 1, 2, 2)

3, 3, 4, 4

Worked Example Using the UpSampling2D Layer

The Keras deep learning library provides this capability in a layer called UpSampling2D.

It can be added to a convolutional neural network and repeats the rows and columns provided as input in the output. For example:

...
# define model
model = Sequential()
model.add(UpSampling2D())

...

# define model

model = Sequential()

model.add(UpSampling2D())

We can demonstrate the behavior of this layer with a simple contrived example.

First, we can define a contrived input image that is 2×2 pixels. We can use specific values for each pixel so that after upsampling, we can see exactly what effect the operation had on the input.

...
# define input data
X = asarray([[1, 2],
			 [3, 4]])
# show input data for context
print(X)

...

# define input data

X = asarray([[1, 2],

[3, 4]])

# show input data for context

print(X)

Once the image is defined, we must add a channel dimension (e.g. grayscale) and also a sample dimension (e.g. we have 1 sample) so that we can pass it as input to the model.

...
# reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))

...

# reshape input data into one sample a sample with a channel

X = X.reshape((1, 2, 2, 1))

We can now define our model.

The model has only the UpSampling2D layer which takes 2×2 grayscale images as input directly and outputs the result of the upsampling operation.

...
# define model
model = Sequential()
model.add(UpSampling2D(input_shape=(2, 2, 1)))
# summarize the model
model.summary()

...

# define model

model = Sequential()

model.add(UpSampling2D(input_shape=(2, 2, 1)))

# summarize the model

model.summary()

We can then use the model to make a prediction, that is upsample a provided input image.

...
# make a prediction with the model
yhat = model.predict(X)

...

# make a prediction with the model

yhat = model.predict(X)

The output will have four dimensions, like the input, therefore, we can convert it back to a 2×2 array to make it easier to review the result.

...
# reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 4))
# summarize output
print(yhat)

...

# reshape output to remove channel to make printing easier

yhat = yhat.reshape((4, 4))

# summarize output

print(yhat)

Tying all of this together, the complete example of using the UpSampling2D layer in Keras is provided below.

# example of using the upsampling layer
from numpy import asarray
from keras.models import Sequential
from keras.layers import UpSampling2D
# define input data
X = asarray([[1, 2],
			 [3, 4]])
# show input data for context
print(X)
# reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))
# define model
model = Sequential()
model.add(UpSampling2D(input_shape=(2, 2, 1)))
# summarize the model
model.summary()
# make a prediction with the model
yhat = model.predict(X)
# reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 4))
# summarize output
print(yhat)

# example of using the upsampling layer

from numpy import asarray

from keras.models import Sequential

from keras.layers import UpSampling2D

# define input data

X = asarray([[1, 2],

[3, 4]])

# show input data for context

print(X)

# reshape input data into one sample a sample with a channel

X = X.reshape((1, 2, 2, 1))

# define model

model = Sequential()

model.add(UpSampling2D(input_shape=(2, 2, 1)))

# summarize the model

model.summary()

# make a prediction with the model

yhat = model.predict(X)

# reshape output to remove channel to make printing easier

yhat = yhat.reshape((4, 4))

# summarize output

print(yhat)

Running the example first creates and summarizes our 2×2 input data.

Next, the model is summarized. We can see that it will output a 4×4 result as we expect, and importantly, the layer has no parameters or model weights. This is because it is not learning anything; it is just doubling the input.

Finally, the model is used to upsample our input, resulting in a doubling of each row and column for our input data, as we expected.

[[1 2]
 [3 4]]

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
up_sampling2d_1 (UpSampling2 (None, 4, 4, 1)           0
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


[[1. 1. 2. 2.]
 [1. 1. 2. 2.]
 [3. 3. 4. 4.]
 [3. 3. 4. 4.]]

[[1 2]

[3 4]]

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

up_sampling2d_1 (UpSampling2 (None, 4, 4, 1) 0

=================================================================

Total params: 0

Trainable params: 0

Non-trainable params: 0

_________________________________________________________________

[[1. 1. 2. 2.]

[1. 1. 2. 2.]

[3. 3. 4. 4.]

[3. 3. 4. 4.]]

By default, the UpSampling2D will double each input dimension. This is defined by the ‘size‘ argument that is set to the tuple (2,2).

You may want to use different factors on each dimension, such as double the width and triple the height. This could be achieved by setting the ‘size‘ argument to (2, 3). The result of applying this operation to a 2×2 image would be a 4×6 output image (e.g. 2×2 and 2×3). For example:

...
# example of using different scale factors for each dimension
model.add(UpSampling2D(size=(2, 3)))

...

# example of using different scale factors for each dimension

model.add(UpSampling2D(size=(2, 3)))

Additionally, by default, the UpSampling2D layer will use a nearest neighbor algorithm to fill in the new rows and columns. This has the effect of simply doubling rows and columns, as described and is specified by the ‘interpolation‘ argument set to ‘nearest‘.

Alternately, a bilinear interpolation method can be used which draws upon multiple surrounding points. This can be specified via setting the ‘interpolation‘ argument to ‘bilinear‘. For example:

...
# example of using bilinear interpolation when upsampling
model.add(UpSampling2D(interpolation='bilinear'))

...

# example of using bilinear interpolation when upsampling

model.add(UpSampling2D(interpolation='bilinear'))

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Simple Generator Model With the UpSampling2D Layer

The UpSampling2D layer is simple and effective, although does not perform any learning.

It is not able to fill in useful detail in the upsampling operation. To be useful in a GAN, each UpSampling2D layer must be followed by a Conv2D layer that will learn to interpret the doubled input and be trained to translate it into meaningful detail.

We can demonstrate this with an example.

In this case, our little GAN generator model must produce a 10×10 image and take a 100 element vector from the latent space as input.

First, a Dense fully connected layer can be used to interpret the input vector and create a sufficient number of activations (outputs) that can be reshaped into a low-resolution version of our output image, in this case, 128 versions of a 5×5 image.

...
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))

...

# define model

model = Sequential()

# define input shape, output enough activations for for 128 5x5 image

model.add(Dense(128 * 5 * 5, input_dim=100))

# reshape vector of activations into 128 feature maps with 5x5

model.add(Reshape((5, 5, 128)))

Next, the 5×5 feature maps can be upsampled to a 10×10 feature map.

...
# double input from 128 5x5 to 1 10x10 feature map
model.add(UpSampling2D())

...

# double input from 128 5x5 to 1 10x10 feature map

model.add(UpSampling2D())

Finally, the upsampled feature maps can be interpreted and filled in with hopefully useful detail by a Conv2D layer.

The Conv2D has a single feature map as output to create the single image we require.

...
# fill in detail in the upsampled feature maps
model.add(Conv2D(1, (3,3), padding='same'))

...

# fill in detail in the upsampled feature maps

model.add(Conv2D(1, (3,3), padding='same'))

Tying this together, the complete example is listed below.

# example of using upsampling in a simple generator model
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import UpSampling2D
from keras.layers import Conv2D
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))
# double input from 128 5x5 to 1 10x10 feature map
model.add(UpSampling2D())
# fill in detail in the upsampled feature maps and output a single image
model.add(Conv2D(1, (3,3), padding='same'))
# summarize model
model.summary()

# example of using upsampling in a simple generator model

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Reshape

from keras.layers import UpSampling2D

from keras.layers import Conv2D

# define model

model = Sequential()

# define input shape, output enough activations for for 128 5x5 image

model.add(Dense(128 * 5 * 5, input_dim=100))

# reshape vector of activations into 128 feature maps with 5x5

model.add(Reshape((5, 5, 128)))

# double input from 128 5x5 to 1 10x10 feature map

model.add(UpSampling2D())

# fill in detail in the upsampled feature maps and output a single image

model.add(Conv2D(1, (3,3), padding='same'))

# summarize model

model.summary()

Running the example creates the model and summarizes the output shape of each layer.

We can see that the Dense layer outputs 3,200 activations that are then reshaped into 128 feature maps with the shape 5×5.

The widths and heights are doubled to 10×10 by the UpSampling2D layer, resulting in a feature map with quadruple the area.

Finally, the Conv2D processes these feature maps and adds in detail, outputting a single 10×10 image.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_1 (Dense)              (None, 3200)              323200
_________________________________________________________________
reshape_1 (Reshape)          (None, 5, 5, 128)         0
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 10, 10, 128)       0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 10, 10, 1)         1153
=================================================================
Total params: 324,353
Trainable params: 324,353
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

dense_1 (Dense) (None, 3200) 323200

_________________________________________________________________

reshape_1 (Reshape) (None, 5, 5, 128) 0

_________________________________________________________________

up_sampling2d_1 (UpSampling2 (None, 10, 10, 128) 0

_________________________________________________________________

conv2d_1 (Conv2D) (None, 10, 10, 1) 1153

=================================================================

Total params: 324,353

Trainable params: 324,353

Non-trainable params: 0

_________________________________________________________________

How to Use the Conv2DTranspose Layer

The Conv2DTranspose or transpose convolutional layer is more complex than a simple upsampling layer.

A simple way to think about it is that it both performs the upsample operation and interprets the coarse input data to fill in the detail while it is upsampling. It is like a layer that combines the UpSampling2D and Conv2D layers into one layer. This is a crude understanding, but a practical starting point.

The need for transposed convolutions generally arises from the desire to use a transformation going in the opposite direction of a normal convolution, i.e., from something that has the shape of the output of some convolution to something that has the shape of its input while maintaining a connectivity pattern that is compatible with said convolution

— A Guide To Convolution Arithmetic For Deep Learning, 2016.

In fact, the transpose convolutional layer performs an inverse convolution operation.

Specifically, the forward and backward passes of the convolutional layer are reversed.

One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.

— A Guide To Convolution Arithmetic For Deep Learning, 2016.

It is sometimes called a deconvolution or deconvolutional layer and models that use these layers can be referred to as deconvolutional networks, or deconvnets.

A deconvnet can be thought of as a convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.

— Visualizing and Understanding Convolutional Networks, 2013.

Referring to this operation as a deconvolution is technically incorrect as a deconvolution is a specific mathematical operation not performed by this layer.

In fact, the traditional convolutional layer does not technically perform a convolutional operation, it performs a cross-correlation.

The deconvolution layer, to which people commonly refer, first appears in Zeiler’s paper as part of the deconvolutional network but does not have a specific name. […] It also has many names including (but not limited to) subpixel or fractional convolutional layer, transposed convolutional layer, inverse, up or backward convolutional layer.

— Is the deconvolution layer the same as a convolutional layer?, 2016.

It is a very flexible layer, although we will focus on its use in the generative models from upsampling an input image.

The transpose convolutional layer is much like a normal convolutional layer. It requires that you specify the number of filters and the kernel size of each filter. The key to the layer is the stride.

Typically, the stride of a convolutional layer is (1×1), that is a filter is moved along one pixel horizontally for each read from left-to-right, then down pixel for the next row of reads. A stride of 2×2 on a normal convolutional layer has the effect of downsampling the input, much like a pooling layer. In fact, a 2×2 stride can be used instead of a pooling layer in the discriminator model.

The transpose convolutional layer is like an inverse convolutional layer. As such, you would intuitively think that a 2×2 stride would upsample the input instead of downsample, which is exactly what happens.

Stride or strides refers to the manner of a filter scanning across an input in a traditional convolutional layer. Whereas, in a transpose convolutional layer, stride refers to the manner in which outputs in the feature map are laid down.

This effect can be implemented with a normal convolutional layer using a fractional input stride (f), e.g. with a stride of f=1/2. When inverted, the output stride is set to the numerator of this fraction, e.g. f=2.

In a sense, upsampling with factor f is convolution with a fractional input stride of 1/f. So long as f is integral, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of f.

— Fully Convolutional Networks for Semantic Segmentation, 2014.

One way that this effect can be achieved with a normal convolutional layer is by inserting new rows and columns of 0.0 values in the input data.

Finally note that it is always possible to emulate a transposed convolution with a direct convolution. The disadvantage is that it usually involves adding many columns and rows of zeros to the input …

— A Guide To Convolution Arithmetic For Deep Learning, 2016.

Let’s make this concrete with an example.

Consider an input image wit the size 2×2 as follows:

         1, 2
Input = (3, 4)

1 2	1, 2 Input = (3, 4)

Assuming a single filter with a 1×1 kernel and model weights that result in no changes to the inputs when output (e.g. a model weight of 1.0 and a bias of 0.0), a transpose convolutional operation with an output stride of 1×1 will reproduce the output as-is:

          1, 2
Output = (3, 4)

1 2	1, 2 Output = (3, 4)

With an output stride of (2,2), the 1×1 convolution requires the insertion of additional rows and columns into the input image so that the reads of the operation can be performed. Therefore, the input looks as follows:

         1, 0, 2, 0
Input = (0, 0, 0, 0)
         3, 0, 4, 0
         0, 0, 0, 0

1, 0, 2, 0

Input = (0, 0, 0, 0)

3, 0, 4, 0

0, 0, 0, 0

The model can then read across this input using an output stride of (2,2) and will output a 4×4 image, in this case with no change as our model weights have no effect by design:

          1, 0, 2, 0
Output = (0, 0, 0, 0)
          3, 0, 4, 0
          0, 0, 0, 0

1, 0, 2, 0

Output = (0, 0, 0, 0)

3, 0, 4, 0

0, 0, 0, 0

Worked Example Using the Conv2DTranspose Layer

Keras provides the transpose convolution capability via the Conv2DTranspose layer.

It can be added to your model directly; for example:

...
# define model
model = Sequential()
model.add(Conv2DTranspose(...))

...

# define model

model = Sequential()

model.add(Conv2DTranspose(...))

We can demonstrate the behavior of this layer with a simple contrived example.

First, we can define a contrived input image that is 2×2 pixels, as we did in the previous section. We can use specific values for each pixel so that after the transpose convolutional operation, we can see exactly what effect the operation had on the input.

...
# define input data
X = asarray([[1, 2],
			 [3, 4]])
# show input data for context
print(X)

...

# define input data

X = asarray([[1, 2],

[3, 4]])

# show input data for context

print(X)

Once the image is defined, we must add a channel dimension (e.g. grayscale) and also a sample dimension (e.g. we have 1 sample) so that we can pass it as input to the model.

...
# reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))

...

# reshape input data into one sample a sample with a channel

X = X.reshape((1, 2, 2, 1))

We can now define our model.

The model has only the Conv2DTranspose layer, which takes 2×2 grayscale images as input directly and outputs the result of the operation.

The Conv2DTranspose both upsamples and performs a convolution. As such, we must specify both the number of filters and the size of the filters as we do for Conv2D layers. Additionally, we must specify a stride of (2,2) because the upsampling is achieved by the stride behavior of the convolution on the input.

Specifying a stride of (2,2) has the effect of spacing out the input. Specifically, rows and columns of 0.0 values are inserted to achieve the desired stride.

In this example, we will use one filter, with a 1×1 kernel and a stride of 2×2 so that the 2×2 input image is upsampled to 4×4.

...
# define model
model = Sequential()
model.add(Conv2DTranspose(1, (1,1), strides=(2,2), input_shape=(2, 2, 1)))
# summarize the model
model.summary()

...

# define model

model = Sequential()

model.add(Conv2DTranspose(1, (1,1), strides=(2,2), input_shape=(2, 2, 1)))

# summarize the model

model.summary()

To make it clear what the Conv2DTranspose layer is doing, we will fix the single weight in the single filter to the value of 1.0 and use a bias value of 0.0.

These weights, along with a kernel size of (1,1) will mean that values in the input will be multiplied by 1 and output as-is, and the 0 values in the new rows and columns added via the stride of 2×2 will be output as 0 (e.g. 1 * 0 in each case).

...
# define weights that they do nothing
weights = [asarray([[[[1]]]]), asarray([0])]
# store the weights in the model
model.set_weights(weights)

...

# define weights that they do nothing

weights = [asarray([[[[1]]]]), asarray([0])]

# store the weights in the model

model.set_weights(weights)

We can then use the model to make a prediction, that is upsample a provided input image.

...
# make a prediction with the model
yhat = model.predict(X)

...

# make a prediction with the model

yhat = model.predict(X)

The output will have four dimensions, like the input, therefore, we can convert it back to a 2×2 array to make it easier to review the result.

...
# reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 4))
# summarize output
print(yhat)

...

# reshape output to remove channel to make printing easier

yhat = yhat.reshape((4, 4))

# summarize output

print(yhat)

Tying all of this together, the complete example of using the Conv2DTranspose layer in Keras is provided below.

# example of using the transpose convolutional layer
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2DTranspose
# define input data
X = asarray([[1, 2],
			 [3, 4]])
# show input data for context
print(X)
# reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))
# define model
model = Sequential()
model.add(Conv2DTranspose(1, (1,1), strides=(2,2), input_shape=(2, 2, 1)))
# summarize the model
model.summary()
# define weights that they do nothing
weights = [asarray([[[[1]]]]), asarray([0])]
# store the weights in the model
model.set_weights(weights)
# make a prediction with the model
yhat = model.predict(X)
# reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 4))
# summarize output
print(yhat)

# example of using the transpose convolutional layer

from numpy import asarray

from keras.models import Sequential

from keras.layers import Conv2DTranspose

# define input data

X = asarray([[1, 2],

[3, 4]])

# show input data for context

print(X)

# reshape input data into one sample a sample with a channel

X = X.reshape((1, 2, 2, 1))

# define model

model = Sequential()

model.add(Conv2DTranspose(1, (1,1), strides=(2,2), input_shape=(2, 2, 1)))

# summarize the model

model.summary()

# define weights that they do nothing

weights = [asarray([[[[1]]]]), asarray([0])]

# store the weights in the model

model.set_weights(weights)

# make a prediction with the model

yhat = model.predict(X)

# reshape output to remove channel to make printing easier

yhat = yhat.reshape((4, 4))

# summarize output

print(yhat)

Running the example first creates and summarizes our 2×2 input data.

Next, the model is summarized. We can see that it will output a 4×4 result as we expect, and importantly, the layer two parameters or model weights. One for the single 1×1 filter and one for the bias. Unlike the UpSampling2D layer, the Conv2DTranspose will learn during training and will attempt to fill in detail as part of the upsampling process.

Finally, the model is used to upsample our input. We can see that the calculations of the cells that involve real values as input result in the real value as output (e.g. 1×1, 1×2, etc.). We can see that where new rows and columns have been inserted by the stride of 2×2, that their 0.0 values multiplied by the 1.0 values in the single 1×1 filter have resulted in 0 values in the output.

[[1 2]
 [3 4]]

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_transpose_1 (Conv2DTr (None, 4, 4, 1)           2
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________


[[1. 0. 2. 0.]
 [0. 0. 0. 0.]
 [3. 0. 4. 0.]
 [0. 0. 0. 0.]]

[[1 2]

[3 4]]

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_transpose_1 (Conv2DTr (None, 4, 4, 1) 2

=================================================================

Total params: 2

Trainable params: 2

Non-trainable params: 0

_________________________________________________________________

[[1. 0. 2. 0.]

[0. 0. 0. 0.]

[3. 0. 4. 0.]

[0. 0. 0. 0.]]

Remember: this is a contrived case where we artificially specified the model weights so that we could see the effect of the transpose convolutional operation.

In practice, we will use a large number of filters (e.g. 64 or 128), a larger kernel (e.g. 3×3, 5×5, etc.), and the layer will be initialized with random weights that will learn how to effectively upsample with detail during training.

In fact, you might imagine how different sized kernels will result in different sized outputs, more than doubling the width and height of the input. In this case, the ‘padding‘ argument of the layer can be set to ‘same‘ to force the output to have the desired (doubled) output shape; for example:

...
# example of using padding to ensure that the output is only doubled
model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same', input_shape=(2, 2, 1)))

...

# example of using padding to ensure that the output is only doubled

model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same', input_shape=(2, 2, 1)))

Simple Generator Model With the Conv2DTranspose Layer

The Conv2DTranspose is more complex than the UpSampling2D layer, but it is also effective when used in GAN models, specifically the generator model.

Either approach can be used, although the Conv2DTranspose layer is preferred, perhaps because of the simpler generator models and possibly better results, although GAN performance and skill is notoriously difficult to quantify.

We can demonstrate using the Conv2DTranspose layer in a generator model with another simple example.

In this case, our little GAN generator model must produce a 10×10 image and take a 100-element vector from the latent space as input, as in the previous UpSampling2D example.

...
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))

...

# define model

model = Sequential()

# define input shape, output enough activations for for 128 5x5 image

model.add(Dense(128 * 5 * 5, input_dim=100))

# reshape vector of activations into 128 feature maps with 5x5

model.add(Reshape((5, 5, 128)))

Next, the 5×5 feature maps can be upsampled to a 10×10 feature map.

We will use a 3×3 kernel size for the single filter, which will result in a slightly larger than doubled width and height in the output feature map (11×11).

Therefore, we will set ‘padding‘ to ‘same’ to ensure the output dimensions are 10×10 as required.

...
# double input from 128 5x5 to 1 10x10 feature map
model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same'))

...

# double input from 128 5x5 to 1 10x10 feature map

model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same'))

Tying this together, the complete example is listed below.

# example of using transpose conv in a simple generator model
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import Conv2DTranspose
from keras.layers import Conv2D
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))
# double input from 128 5x5 to 1 10x10 feature map
model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same'))
# summarize model
model.summary()

# example of using transpose conv in a simple generator model

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Reshape

from keras.layers import Conv2DTranspose

from keras.layers import Conv2D

# define model

model = Sequential()

# define input shape, output enough activations for for 128 5x5 image

model.add(Dense(128 * 5 * 5, input_dim=100))

# reshape vector of activations into 128 feature maps with 5x5

model.add(Reshape((5, 5, 128)))

# double input from 128 5x5 to 1 10x10 feature map

model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same'))

# summarize model

model.summary()

Running the example creates the model and summarizes the output shape of each layer.

We can see that the Dense layer outputs 3,200 activations that are then reshaped into 128 feature maps with the shape 5×5.

The widths and heights are doubled to 10×10 by the Conv2DTranspose layer resulting in a single feature map with quadruple the area.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_1 (Dense)              (None, 3200)              323200
_________________________________________________________________
reshape_1 (Reshape)          (None, 5, 5, 128)         0
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 10, 10, 1)         1153
=================================================================
Total params: 324,353
Trainable params: 324,353
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

dense_1 (Dense) (None, 3200) 323200

_________________________________________________________________

reshape_1 (Reshape) (None, 5, 5, 128) 0

_________________________________________________________________

conv2d_transpose_1 (Conv2DTr (None, 10, 10, 1) 1153

=================================================================

Total params: 324,353

Trainable params: 324,353

Non-trainable params: 0

_________________________________________________________________

Summary

In this tutorial, you discovered how to use UpSampling2D and Conv2DTranspose Layers in Generative Adversarial Networks when generating images.

Specifically, you learned:

Generative models in the GAN architecture are required to upsample input data in order to generate an output image.
The Upsampling layer is a simple layer with no weights that will double the dimensions of input and can be used in a generative model when followed by a traditional convolutional layer.
The Transpose Convolutional layer is an inverse convolutional layer that will both upsample input and learn how to fill in details during the model training process.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

43 Responses to How to use the UpSampling2D and Conv2DTranspose Layers in Keras

erlend June 24, 2019 at 4:18 pm #

I believe that transposed convolutional layers aren’t used any more in generative models because of problems with artifacts. Upsampling (i.e. nearest neighbour resizing) is the standard.

Reply
- Jason Brownlee June 25, 2019 at 6:10 am #
  
  It really depends on the model.
  
  For simpler DCGANs, they work great, for larger Progressive Loading GANs, StyleGAN, etc. Upsampling layers are used.
  
  Reply
- Molefe Molefe July 22, 2021 at 10:17 pm #
  
  I tried the convolutions multiple times, I took the interpretation as, we have a 128 5×5 images with different associated convolutions. When we upsample, we have 128 10×10 images. However, how did it become a 10x10x1 image when we only have 128, 1 channel filters.
  
  Because the filters are 128 in number but each have to one channel. So I assumed it was a single channel multiple filter convolution. ? Shouldn’t the remaining convolution be 128 10×10 images(especially after applying padding to ensure the size remains the same)
  
  Reply
  - Jason Brownlee July 23, 2021 at 5:59 am #
    
    You may need to debug your example and discover the cause of your misunderstanding, it’s not clear to me.
    
    Reply
Jack June 25, 2019 at 4:10 am #

This is fantastic. Exactly those blogposts are invaluable because the explain what’s going on under the hood.
Thank you so much Jason. I highly appreciate it and love to read those blogposts from you.
Great work, please keep going!
best regards

Reply
- Jason Brownlee June 25, 2019 at 6:28 am #
  
  Thanks Jack, I’m glad it helps!
  
  Reply
Amit August 20, 2019 at 9:17 am #

Thank you for this blog. Regarding transposed convolution, are the filled values limited to “0”?
If so, the value willl always equal 0 and only depend on the bias term.

Reply
- Jason Brownlee August 20, 2019 at 2:11 pm #
  
  Yes, zero values are added as they have no effect on the calculation.
  
  Not sure if that answers your question?
  
  Reply
XX October 31, 2019 at 7:28 pm #

Great tutorial! I like the “contrived but deliberate” approach in defining layer, assigning user-defined weights and predicting the output.

I do have a question about the Conv2DTranspose examples.

1) assuming X = [[1,2],[3,4]], and the layer is “model.add(Conv2DTranspose(1, (2,2), strides=(2,2), input_shape=(2,2,1)))”. Then the upsampled input will be:[[1,0,2,0], [0,0,0,0], [3,0,4,0], [0,0,0,0]]. when I applied the model.predict(X), the output looks like a “tiling” with stride of 2, instead of “2dconv” with stride of 2. In other words, whatever the model weights are, which is the 2×2 kernel [[a,b],[c,d]], they are element-wise multiplied by the input: 1, 2, 3, and 4 respectively than concatenated together, like [
[a, b, 2a, 2b],
[c, d, 2c, 2d],
[3a,3b,4a, 4b],
[3c,3d,4c, 4d]
]

Here is a concrete example:

/*
model = Sequential()
model.add(Conv2DTranspose(1,(2,2), strides=(2,2), padding=’valid’, input_shape=(2,2,1)))
model.weights
# output:
[,
]

X = np.array([[1,2],[3,4]]).reshape((1,2,2,1))
model.predict(X).reshape((4,4))

#Output:
array([[-0.28302956, 0.67811257, -0.5660591 , 1.3562251 ],
[ 0.19257468, -0.58342797, 0.38514936, -1.1668559 ],
[-0.84908867, 2.0343378 , -1.1321182 , 2.7124503 ],
[ 0.57772404, -1.750284 , 0.7702987 , -2.3337119 ]],
dtype=float32)
/*

In that sense, I found the animation shown in [Convolution Arithmetic Project, GitHub.], particularly the [No padding, strides, transposed] panel somewhat misleading.

Reply
- Jason Brownlee November 1, 2019 at 5:28 am #
  
  Thanks!
  
  Nice. Thanks for sharing.
  
  Reply
Rick December 10, 2019 at 9:55 am #

I made the model using conv2dtranspose that can increase the resolution of image from 14×14 to 28X28. Is there a way I can upload a new image of some resolution and feed it to model to get the image of doubled resolution?

Reply
- Jason Brownlee December 10, 2019 at 1:32 pm #
  
  Nice work.
  
  I think you’re referring to super resolution. I don’t have examples of this, but perhaps try google search?
  
  Reply
Rick December 11, 2019 at 3:32 am #

Thank you

Reply
- Jason Brownlee December 11, 2019 at 7:02 am #
  
  You’re welcome.
  
  Reply
Raj December 15, 2019 at 1:31 pm #

Hi Jason,

Thanks for the wonderful blog!

I am trying to use this Conv3DTranspose in keras its acting like a conv3D instead of conv3DTranspose, that is the size is halved instead of doubling? My tensorflow is 1.4 version.

Reply
- Jason Brownlee December 16, 2019 at 6:09 am #
  
  Sorry, I don’t have tutorials on the Conv3DTranspose, I can’t give you good advice about it.
  
  Reply
- Anonymous February 19, 2020 at 12:38 am #
  
  did you try it with stride 0.5 instead of 2?
  
  Reply
D.S. LEE April 21, 2020 at 10:42 am #

Crystal clear. Thank you for your good quality blog as always.

Reply
- Jason Brownlee April 21, 2020 at 11:46 am #
  
  Thanks!
  
  You’re very welcome.
  
  Reply
Shreyash May 9, 2020 at 1:30 pm #

This is a very helpful tutorial. Have been wandering around all over and there comes the perfect blog post to explain it at root level. Great JOB!!

Reply
- Jason Brownlee May 9, 2020 at 1:51 pm #
  
  You’re very welcome!
  
  Reply
Vaibhav May 19, 2020 at 9:33 am #

Hello Dr. Jason. Thanks for the tutorial.
I’m struggling with the output shape in the code blocks related to Conv2DTranspose:
stride = (2, 2), kernel_size = (1, 1)
Your output shape and the one out of TensorFlow code matches but I can’t understand why the last row of zeros is part of the output as the allowed number of strides is over and the kernel is of (1×1)?
I read the pdf on arxiv “A guide to convolution arithmetic for deep learning”, in that, in the last section of Transposed convolution (pg. 26) there are two formulas provided for calculating the output shape.
formula 1) out_shape = s(i − 1) + k − 2p
formula 2) out_shape = s(i − 1) + a + k − 2p; where a = (n + 2p – k) % s
s=stride, i=input_size, k=kernel_size, p=padding

On TensorFlow’s documentation the o/p shape calculation is given as:
new_rows = ((rows – 1) * strides[0] + kernel_size[0] – 2 * padding[0] +
output_padding[0])
new_cols = ((cols – 1) * strides[1] + kernel_size[1] – 2 * padding[1] +
output_padding[1])
and it’s same for Pytorch documnetation. They both are same as formula 1.

I dug around the Tensorflow’s implementation of this function and found they are calculated as:
out_size = input * stride + max(kernel – stride, 0)

I went ahead and created some cases for my understanding and now I’m lost.
Case 1) Input: 2, Stride: 1, kernel: 1
Formula 1: O/P Shape: 2
Formula 2: O/P Shape: 2
TF/Pytorch doc: O/P Shape: 2
TF code O/P: O/P Shape: 2

Case 2) Input: 2, Stride: 1, kernel: 3
Formula 1: O/P Shape: 4
Formula 2: O/P Shape: 4
TF/Pytorch doc: O/P Shape: 4
TF code O/P: O/P Shape: 4

Get’s confusing from here
Case 3) Input: 2, Stride: 2, kernel: 1
Formula 1: O/P Shape: 3
Formula 2: O/P Shape: 4 (same as in this tutorial)
TF/Pytorch doc: O/P Shape: 3
TF code O/P: O/P Shape: 4

Case 4) Input: 2, Stride: 2, kernel: 3
Formula 1: O/P Shape: 5
Formula 2: O/P Shape: 6
TF/Pytorch doc: O/P Shape: 5
TF code O/P: O/P Shape: 5

These are all for padding = “Valid”
Which is the correct one? I’m lost.
Is there a single formula to correctly calculate the output shape like for normal Convolution process.
Any help would be appreciated, I’m too confused.

Reply
- Jason Brownlee May 19, 2020 at 1:24 pm #
  
  I think you’re asking about 1×1 convolutions. If so, perhaps this will help:
  https://machinelearningmastery.com/introduction-to-1×1-convolutions-to-reduce-the-complexity-of-convolutional-neural-networks/
  
  Reply
Ryu July 16, 2020 at 8:39 pm #

Thank you for the great explanation!

As the next step, I would like to know how to design a kernel size, stride and padding size. For example, if an input shape is (1024, 4, 4), the different settings of transposed convolution layers below produce the same size of the output which is (512, 16, 16).

Setting A: kernel=4, stride=2, padding=1
Setting B: kernel=5, stride=1, padding=0

Is there any clue to design them?

Reply
- Jason Brownlee July 17, 2020 at 6:15 am #
  
  Yes, generally scaling up or down involving doubling or halving each axis.
  
  Perhaps look at some worked examples, such as a GAN with a unet to see the pattern (search on the blog).
  
  Reply
Carlos October 24, 2020 at 2:59 am #

Hi Jason,

How to get job for these ml projects?
If you were recruiter except concepts, what are the project do you really want to see as ml engineer.

I want to make a job ready portfolio projects in Ml

Reply
- Jason Brownlee October 24, 2020 at 7:09 am #
  
  Depends on what interests you or what job you’re going after, what the company does, etc.
  
  Reply
Anthony The Koala December 9, 2020 at 3:05 am #

Dear Dr Jason,
In one of the examples, you had this expression:

weights = [asarray([[[[1]]]]),asarray([0])] weights [array([[[[1]]]]), array([0])]

1
2
3

weights = [asarray([[[[1]]]]),asarray([0])]
weights
[array([[[[1]]]]), array([0])]

What is the purpose of the enclosing of 1 in four brackets, let 0 in alone one bracket?

Thank you
Anthony of Sydney

Reply
- Jason Brownlee December 9, 2020 at 6:28 am #
  
  As I recall, that was the structure expected by the model, no doubt something like [model[layer[node[weight]]]]
  
  Reply
  - Anthony The Koala December 9, 2020 at 6:48 am #
    
    Dear Dr Jason,
    Thank you for your reply.
    I googled the structure
    
    weights = [asarray([[[[1]]]]),asarray([0])] weights [array([[[[1]]]]), array([0])]
    
    1
    2
    3
    
    weights = [asarray([[[[1]]]]),asarray([0])]
    weights
    [array([[[[1]]]]), array([0])]
    
    and
    
    [model[layer[node[weight]]]]
    
    1
    
    [model[layer[node[weight]]]]
    
    and could not find any notes.
    
    Could you please point me to notes on the structure expected by the model including the sequence of model[layer[node[weight]]]]
    
    Thank you,
    Anthony of Sydney
    
    Thank yoiu,
    Anthony of Sydney
    
    Reply
    - Jason Brownlee December 9, 2020 at 7:54 am #
      
      I don’t recall notes on the topic, I think I figured out the required structure by inspecting the code – it was a few years a go.
      
      Reply
Anthony The Koala December 11, 2020 at 8:26 am #

Dear Dr Jason,
I tried the document at https://keras.io/api/layers/base_layer/ but could not relate the set_weights in this tutorial where

weights = [asarray([[[[1]]]]),asarray([0])]

1

weights = [asarray([[[[1]]]]),asarray([0])]

Nothing in the documentation that relates to the tutorial.
Thank you,
Anthony of Sydney

Reply
- Jason Brownlee December 11, 2020 at 1:30 pm #
  
  You can learn more about the shape of model weights by inspecting the code itself, or perhaps retrieving the weights for a model and reviewing their shape. I recall it was relatively straightforward.
  
  Reply

Anthony The Koala December 11, 2020 at 3:27 pm #

Dear Dr Jason,
I got the weights of the model

model.get_weights()
[array([[[[1.]]]], dtype=float32), array([0.], dtype=float32)]

1 2	model.get_weights() [array([[[[1.]]]], dtype=float32), array([0.], dtype=float32)]

I still cannot relate the weights to the input given

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_transpose (Conv2DTran (None, 4, 4, 1)           2         
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________

model.summary()

Model: "sequential"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_transpose (Conv2DTran (None, 4, 4, 1) 2

=================================================================

Total params: 2

Trainable params: 2

Non-trainable params: 0

_________________________________________________________________

Why are the weights a list with the first parameter [[[[0]]]] – what does it mean?
And what is the purpose of setting the weights and how much weighting?

Thank you,
Anthony of Sydney

Jason Brownlee December 12, 2020 at 6:22 am #

That is an array with a given number of dimensions as I explained previously.

We can set the weights manually for small experiments, e..g setting them to 1 to see the effect of different operations – as I do in my intro to conv and pooling operations. In practice we would not set the weights manually.

Reply

Anthony The Koala December 12, 2020 at 11:02 am #

Dear Dr Jason,
Thank you again for your reply. I understand that weights are not set manually. The weights are arrived as the result of computing the weights of each neurone.

Background – please skip and go to question
Nevertheless, I still want to understand the structures of weights. I have have used model.get_weights() from your tutorial at https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/

Snippet of code:

def baseline_model():
	# create model
	model = Sequential()
	model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
	model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# build the model
model = baseline_model()

#Finding out the structure of the weights

my_weights = model.get_weights()
shape(my_weights)
(4,)
#There are four rows.
len(my_weights[0]), len(my_weights[1]), len(my_weights[2]), len(my_weights[3])
(784, 784, 784, 10)

num_classes
10
num_pixels
784

def baseline_model():

# create model

model = Sequential()

model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))

model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))

# Compile model

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

return model

# build the model

model = baseline_model()

#Finding out the structure of the weights

my_weights = model.get_weights()

shape(my_weights)

(4,)

#There are four rows.

len(my_weights[0]), len(my_weights[1]), len(my_weights[2]), len(my_weights[3])

(784, 784, 784, 10)

num_classes

num_pixels

784

Question please:
* Given that there are two layers, why are the three layers of 784 weights = num_pixels and another layer of 10 = num_pixels – reference the define_baseline() code.

Thank youj,
Anthony of Sydney

Jason Brownlee December 12, 2020 at 1:24 pm #

I believe there are standard formula for calculating the number of weights in a layer, sorry, I don’t recall them off the top of my head.

Reply

ghambi May 1, 2021 at 1:44 am #

hi
i want to implement a gan to reduce compression artefacts i don’t know how to do it can u give me an idea please

Reply
- Jason Brownlee May 1, 2021 at 6:07 am #
  
  Perhaps you can model it as an image translation task with a cyclegan or a pix2pix gan.
  
  Reply
Vijay Srinivas Tida April 18, 2022 at 11:59 am #

Hi may I know the applications where the transpose convolution with a combination of upsampling version of 1 and convolution layer is used ?

Reply
- James Carmichael April 19, 2022 at 7:14 am #
  
  Hi Vijay…You may find the following helpful:
  
  https://dsp.stackexchange.com/questions/3228/practical-applications-of-upsampling-and-downsampling
  
  Reply
  - Vijay Srinivas Tida April 19, 2022 at 1:09 pm #
    
    Hi James, Thanks for your response. I just want to know under the topic Simple Generator Model With the UpSampling2D Layer it showed using
    
    model.add(UpSampling2D())
    # fill in detail in the upsampled feature maps and output a single image
    model.add(Conv2D(1, (3,3), padding=’same’))
    
    Do we have any standard applications using this upsampling layer version 1 and convolution layer for deep learning models? I saw only the applications related to using upsampling layer version 2 for transpose convolution layer.
    
    Reply
Vijay Srinivas Tida April 19, 2022 at 1:08 pm #

Hi James, Thanks for your response. I just want to know under the topic Simple Generator Model With the UpSampling2D Layer it showed using

model.add(UpSampling2D())
# fill in detail in the upsampled feature maps and output a single image
model.add(Conv2D(1, (3,3), padding=’same’))

Do we have any standard applications using this upsampling layer version 1 and convolution layer for deep learning models? I saw only the applications related to using upsampling layer version 2 for transpose convolution layer.

Reply

Navigation

How to use the UpSampling2D and Conv2DTranspose Layers in Keras

Tutorial Overview

Need for Upsampling in Generative Adversarial Networks

How to Use the UpSampling2D Layer

Worked Example Using the UpSampling2D Layer

Want to Develop GANs from Scratch?

Simple Generator Model With the UpSampling2D Layer

How to Use the Conv2DTranspose Layer

Worked Example Using the Conv2DTranspose Layer

Simple Generator Model With the Conv2DTranspose Layer

Further Reading

Papers

API

Articles

Summary

Develop Generative Adversarial Networks Today!

Develop Your GAN Models in Minutes

Finally Bring GAN Models to your Vision Projects

More On This Topic

43 Responses to How to use the UpSampling2D and Conv2DTranspose Layers in Keras

Leave a Reply Click here to cancel reply.