How to Develop a Pix2Pix GAN for Image-to-Image Translation

Last Updated on

The Pix2Pix Generative Adversarial Network, or GAN, is an approach to training a deep convolutional neural network for image-to-image translation tasks.

The careful configuration of architecture as a type of image-conditional GAN allows for both the generation of large images compared to prior GAN models (e.g. such as 256×256 pixels) and the capability of performing well on a variety of different image-to-image translation tasks.

In this tutorial, you will discover how to develop a Pix2Pix generative adversarial network for image-to-image translation.

After completing this tutorial, you will know:

  • How to load and prepare the satellite image to Google maps image-to-image translation dataset.
  • How to develop a Pix2Pix model for translating satellite photographs to Google map images.
  • How to use the final Pix2Pix generator model to translate ad hoc satellite images.

Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code.

Let’s get started.

How to Develop a Pix2Pix Generative Adversarial Network for Image-to-Image Translation

How to Develop a Pix2Pix Generative Adversarial Network for Image-to-Image Translation
Photo by European Southern Observatory, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the Pix2Pix GAN?
  2. Satellite to Map Image Translation Dataset
  3. How to Develop and Train a Pix2Pix Model
  4. How to Translate Images With a Pix2Pix Model
  5. How to Translate Google Maps to Satellite Images

What Is the Pix2Pix GAN?

Pix2Pix is a Generative Adversarial Network, or GAN, model designed for general purpose image-to-image translation.

The approach was presented by Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” and presented at CVPR in 2017.

The GAN architecture is comprised of a generator model for outputting new plausible synthetic images, and a discriminator model that classifies images as real (from the dataset) or fake (generated). The discriminator model is updated directly, whereas the generator model is updated via the discriminator model. As such, the two models are trained simultaneously in an adversarial process where the generator seeks to better fool the discriminator and the discriminator seeks to better identify the counterfeit images.

The Pix2Pix model is a type of conditional GAN, or cGAN, where the generation of the output image is conditional on an input, in this case, a source image. The discriminator is provided both with a source image and the target image and must determine whether the target is a plausible transformation of the source image.

The generator is trained via adversarial loss, which encourages the generator to generate plausible images in the target domain. The generator is also updated via L1 loss measured between the generated image and the expected output image. This additional loss encourages the generator model to create plausible translations of the source image.

The Pix2Pix GAN has been demonstrated on a range of image-to-image translation tasks such as converting maps to satellite photographs, black and white photographs to color, and sketches of products to product photographs.

Now that we are familiar with the Pix2Pix GAN, let’s prepare a dataset that we can use with image-to-image translation.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Satellite to Map Image Translation Dataset

In this tutorial, we will use the so-called “maps” dataset used in the Pix2Pix paper.

This is a dataset comprised of satellite images of New York and their corresponding Google maps pages. The image translation problem involves converting satellite photos to Google maps format, or the reverse, Google maps images to Satellite photos.

The dataset is provided on the pix2pix website and can be downloaded as a 255-megabyte zip file.

Download the dataset and unzip it into your current working directory. This will create a directory called “maps” with the following structure:

The train folder contains 1,097 images, whereas the validation dataset contains 1,099 images.

Images have a digit filename and are in JPEG format. Each image is 1,200 pixels wide and 600 pixels tall and contains both the satellite image on the left and the Google maps image on the right.

Sample Image From the Maps Dataset Including Both Satellite and Google Maps Image.

Sample Image From the Maps Dataset Including Both Satellite and Google Maps Image.

We can prepare this dataset for training a Pix2Pix GAN model in Keras. We will just work with the images in the training dataset. Each image will be loaded, rescaled, and split into the satellite and Google map elements. The result will be 1,097 color image pairs with the width and height of 256×256 pixels.

The load_images() function below implements this. It enumerates the list of images in a given directory, loads each with the target size of 256×512 pixels, splits each image into satellite and map elements and returns an array of each.

We can call this function with the path to the training dataset. Once loaded, we can save the prepared arrays to a new file in compressed format for later use.

The complete example is listed below.

Running the example loads all images in the training dataset, summarizes their shape to ensure the images were loaded correctly, then saves the arrays to a new file called maps_256.npz in compressed NumPy array format.

This file can be loaded later via the load() NumPy function and retrieving each array in turn.

We can then plot some images pairs to confirm the data has been handled correctly.

Running this example loads the prepared dataset and summarizes the shape of each array, confirming our expectations of a little over one thousand 256×256 image pairs.

A plot of three image pairs is also created showing the satellite images on the top and Google map images on the bottom.

We can see that satellite images are quite complex and that although the Google map images are much simpler, they have color codings for things like major roads, water, and parks.

Plot of Three Image Pairs Showing Satellite Images (top) and Google Map Images (bottom).

Plot of Three Image Pairs Showing Satellite Images (top) and Google Map Images (bottom).

Now that we have prepared the dataset for image translation, we can develop our Pix2Pix GAN model.

How to Develop and Train a Pix2Pix Model

In this section, we will develop the Pix2Pix model for translating satellite photos to Google maps images.

The same model architecture and configuration described in the paper was used across a range of image translation tasks. This architecture is both described in the body of the paper, with additional detail in the appendix of the paper, and a fully working implementation provided as open source with the Torch deep learning framework.

The implementation in this section will use the Keras deep learning framework based directly on the model described in the paper and implemented in the author’s code base, designed to take and generate color images with the size 256×256 pixels.

The architecture is comprised of two models: the discriminator and the generator.

The discriminator is a deep convolutional neural network that performs image classification. Specifically, conditional-image classification. It takes both the source image (e.g. satellite photo) and the target image (e.g. Google maps image) as input and predicts the likelihood of whether target image is real or a fake translation of the source image.

The discriminator design is based on the effective receptive field of the model, which defines the relationship between one output of the model to the number of pixels in the input image. This is called a PatchGAN model and is carefully designed so that each output prediction of the model maps to a 70×70 square or patch of the input image. The benefit of this approach is that the same model can be applied to input images of different sizes, e.g. larger or smaller than 256×256 pixels.

The output of the model depends on the size of the input image but may be one value or a square activation map of values. Each value is a probability for the likelihood that a patch in the input image is real. These values can be averaged to give an overall likelihood or classification score if needed.

The define_discriminator() function below implements the 70×70 PatchGAN discriminator model as per the design of the model in the paper. The model takes two input images that are concatenated together and predicts a patch output of predictions. The model is optimized using binary cross entropy, and a weighting is used so that updates to the model have half (0.5) the usual effect. The authors of Pix2Pix recommend this weighting of model updates to slow down changes to the discriminator, relative to the generator model during training.

The generator model is more complex than the discriminator model.

The generator is an encoder-decoder model using a U-Net architecture. The model takes a source image (e.g. satellite photo) and generates a target image (e.g. Google maps image). It does this by first downsampling or encoding the input image down to a bottleneck layer, then upsampling or decoding the bottleneck representation to the size of the output image. The U-Net architecture means that skip-connections are added between the encoding layers and the corresponding decoding layers, forming a U-shape.

The image below makes the skip-connections clear, showing how the first layer of the encoder is connected to the last layer of the decoder, and so on.

Architecture of the U-Net Generator Model

Architecture of the U-Net Generator Model
Taken from Image-to-Image Translation With Conditional Adversarial Networks

The encoder and decoder of the generator are comprised of standardized blocks of convolutional, batch normalization, dropout, and activation layers. This standardization means that we can develop helper functions to create each block of layers and call it repeatedly to build-up the encoder and decoder parts of the model.

The define_generator() function below implements the U-Net encoder-decoder generator model. It uses the define_encoder_block() helper function to create blocks of layers for the encoder and the decoder_block() function to create blocks of layers for the decoder. The tanh activation function is used in the output layer, meaning that pixel values in the generated image will be in the range [-1,1].

The discriminator model is trained directly on real and generated images, whereas the generator model is not.

Instead, the generator model is trained via the discriminator model. It is updated to minimize the loss predicted by the discriminator for generated images marked as “real.” As such, it is encouraged to generate more real images. The generator is also updated to minimize the L1 loss or mean absolute error between the generated image and the target image.

The generator is updated via a weighted sum of both the adversarial loss and the L1 loss, where the authors of the model recommend a weighting of 100 to 1 in favor of the L1 loss. This is to encourage the generator strongly toward generating plausible translations of the input image, and not just plausible images in the target domain.

This can be achieved by defining a new logical model comprised of the weights in the existing standalone generator and discriminator model. This logical or composite model involves stacking the generator on top of the discriminator. A source image is provided as input to the generator and to the discriminator, although the output of the generator is connected to the discriminator as the corresponding “target” image. The discriminator then predicts the likelihood that the generator was a real translation of the source image.

The discriminator is updated in a standalone manner, so the weights are reused in this composite model but are marked as not trainable. The composite model is updated with two targets, one indicating that the generated images were real (cross entropy loss), forcing large weight updates in the generator toward generating more realistic images, and the executed real translation of the image, which is compared against the output of the generator model (L1 loss).

The define_gan() function below implements this, taking the already-defined generator and discriminator models as arguments and using the Keras functional API to connect them together into a composite model. Both loss functions are specified for the two outputs of the model and the weights used for each are specified in the loss_weights argument to the compile() function.

Next, we can load our paired images dataset in compressed NumPy array format.

This will return a list of two NumPy arrays: the first for source images and the second for corresponding target images.

Training the discriminator will require batches of real and fake images.

The generate_real_samples() function below will prepare a batch of random pairs of images from the training dataset, and the corresponding discriminator label of class=1 to indicate they are real.

The generate_fake_samples() function below uses the generator model and a batch of real source images to generate an equivalent batch of target images for the discriminator.

These are returned with the label class-0 to indicate to the discriminator that they are fake.

Typically, GAN models do not converge; instead, an equilibrium is found between the generator and discriminator models. As such, we cannot easily judge when training should stop. Therefore, we can save the model and use it to generate sample image-to-image translations periodically during training, such as every 10 training epochs.

We can then review the generated images at the end of training and use the image quality to choose a final model.

The summarize_performance() function implements this, taking the generator model at a point during training and using it to generate a number, in this case three, of translations of randomly selected images in the dataset. The source, generated image, and expected target are then plotted as three rows of images and the plot saved to file. Additionally, the model is saved to an H5 formatted file that makes it easier to load later.

Both the image and model filenames include the training iteration number, allowing us to easily tell them apart at the end of training.

Finally, we can train the generator and discriminator models.

The train() function below implements this, taking the defined generator, discriminator, composite model, and loaded dataset as input. The number of epochs is set at 100 to keep training times down, although 200 was used in the paper. A batch size of 1 is used as is recommended in the paper.

Training involves a fixed number of training iterations. There are 1,097 images in the training dataset. One epoch is one iteration through this number of examples, with a batch size of one means 1,097 training steps. The generator is saved and evaluated every 10 epochs or every 10,970 training steps, and the model will run for 100 epochs, or a total of 109,700 training steps.

Each training step involves first selecting a batch of real examples, then using the generator to generate a batch of matching fake samples using the real source images. The discriminator is then updated with the batch of real images and then fake images.

Next, the generator model is updated providing the real source images as input and providing class labels of 1 (real) and the real target images as the expected outputs of the model required for calculating loss. The generator has two loss scores as well as the weighted sum score returned from the call to train_on_batch(). We are only interested in the weighted sum score (the first value returned) as it is used to update the model weights.

Finally, the loss for each update is reported to the console each training iteration and model performance is evaluated every 10 training epochs.

Tying all of this together, the complete code example of training a Pix2Pix GAN to translate satellite photos to Google maps images is listed below.

The example can be run on CPU hardware, although GPU hardware is recommended.

The example might take about two hours to run on modern GPU hardware.

Note: your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

The loss is reported each training iteration, including the discriminator loss on real examples (d1), discriminator loss on generated or fake examples (d2), and generator loss, which is a weighted average of adversarial and L1 loss (g).

If loss for the discriminator goes to zero and stays there for a long time, consider re-starting the training run as it is an example of a training failure.

Models are saved every 10 epochs and saved to a file with the training iteration number. Additionally, images are generated every 10 epochs and compared to the expected target images. These plots can be assessed at the end of the run and used to select a final generator model based on generated image quality.

At the end of the run, will you will have 10 saved model files and 10 plots of generated images.

After the first 10 epochs, map images are generated that look plausible, although the lines for streets are not entirely straight and images contain some blurring. Nevertheless, large structures are in the right places with mostly the right colors.

Plot of Satellite to Google Map Translated Images Using Pix2Pix After 10 Training Epochs

Plot of Satellite to Google Map Translated Images Using Pix2Pix After 10 Training Epochs

Generated images after about 50 training epochs begin to look very realistic, at least to mean, and quality appears to remain good for the remainder of the training process.

Note the first generated image example below (right column, middle row) that includes more useful detail than the real Google map image.

Plot of Satellite to Google Map Translated Images Using Pix2Pix After 100 Training Epochs

Plot of Satellite to Google Map Translated Images Using Pix2Pix After 100 Training Epochs

Now that we have developed and trained the Pix2Pix model, we can explore how they can be used in a standalone manner.

How to Translate Images With a Pix2Pix Model

Training the Pix2Pix model results in many saved models and samples of generated images for each.

More training epochs does not necessarily mean a better quality model. Therefore, we can choose a model based on the quality of the generated images and use it to perform ad hoc image-to-image translation.

In this case, we will use the model saved at the end of the run, e.g. after 100 epochs or 109,600 training iterations.

A good starting point is to load the model and use it to make ad hoc translations of source images in the training dataset.

First, we can load the training dataset. We can use the same function named load_real_samples() for loading the dataset as was used when training the model.

This function can be called as follows:

Next, we can load the saved Keras model.

Next, we can choose a random image pair from the training dataset to use as an example.

We can provide the source satellite image as input to the model and use it to predict a Google map image.

Finally, we can plot the source, generated image, and the expected target image.

The plot_images() function below implements this, providing a nice title above each image.

This function can be called with each of our source, generated, and target images.

Tying all of this together, the complete example of performing an ad hoc image-to-image translation with an example from the training dataset is listed below.

Running the example will select a random image from the training dataset, translate it to a Google map, and plot the result compared to the expected image.

Your specific results will vary; try running the example a few times.

In this case, we can see that the generated image captures large roads with orange and yellow as well as green park areas. The generated image is not perfect but is very close to the expected image.

Plot of Satellite to Google Map Image Translation With Final Pix2Pix GAN Model

Plot of Satellite to Google Map Image Translation With Final Pix2Pix GAN Model

We may also want to use the model to translate a given standalone image.

We can select an image from the validation dataset under maps/val and crop the satellite element of the image. This can then be saved and used as input to the model.

In this case, we will use “maps/val/1.jpg“.

Example Image From the Validation Part of the Maps Dataset

Example Image From the Validation Part of the Maps Dataset

We can use an image program to create a rough crop of the satellite element of this image to use as input and save the file as satellite.jpg in the current working directory.

Example of a Cropped Satellite Image to Use as Input to the Pix2Pix Model.

Example of a Cropped Satellite Image to Use as Input to the Pix2Pix Model.

We must load the image as a NumPy array of pixels with the size of 256×256, rescale the pixel values to the range [-1,1], and then expand the single image dimensions to represent one input sample.

The load_image() function below implements this, returning image pixels that can be provided directly to a loaded Pix2Pix model.

We can then load our cropped satellite image.

As before, we can load our saved Pix2Pix generator model and generate a translation of the loaded image.

Finally, we can scale the pixel values back to the range [0,1] and plot the result.

Tying this all together, the complete example of performing an ad hoc image translation with a single image file is listed below.

Running the example loads the image from file, creates a translation of it, and plots the result.

The generated image appears to be a reasonable translation of the source image.

The streets do not appear to be straight lines and the detail of the buildings is a bit lacking. Perhaps with further training or choice of a different model, higher-quality images could be generated.

Plot of Satellite Image Translated to Google Maps With Final Pix2Pix GAN Model

Plot of Satellite Image Translated to Google Maps With Final Pix2Pix GAN Model

How to Translate Google Maps to Satellite Images

Now that we are familiar with how to develop and use a Pix2Pix model for translating satellite images to Google maps, we can also explore the reverse.

That is, we can develop a Pix2Pix model to translate Google map images to plausible satellite images. This requires that the model invent or hallucinate plausible buildings, roads, parks, and more.

We can use the same code to train the model with one small difference. We can change the order of the datasets returned from the load_real_samples() function; for example:

Note: the order of X1 and X2 is reversed.

This means that the model will take Google map images as input and learn to generate satellite images.

Run the example as before.

Note: your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

As before, the loss of the model is reported each training iteration. If loss for the discriminator goes to zero and stays there for a long time, consider re-starting the training run as it is an example of a training failure.

It is harder to judge the quality of generated satellite images, nevertheless, plausible images are generated after just 10 epochs.

Plot of Google Map to Satellite Translated Images Using Pix2Pix After 10 Training Epochs

Plot of Google Map to Satellite Translated Images Using Pix2Pix After 10 Training Epochs

As before, image quality will improve and will continue to vary over the training process. A final model can be chosen based on generated image quality, not total training epochs.

The model appears to have little difficulty in generating reasonable water, parks, roads, and more.

Plot of Google Map to Satellite Translated Images Using Pix2Pix After 90 Training Epochs

Plot of Google Map to Satellite Translated Images Using Pix2Pix After 90 Training Epochs


This section lists some ideas for extending the tutorial that you may wish to explore.

  • Standalone Satellite. Develop an example of translating standalone Google map images to satellite images, as we did for satellite to Google map images.
  • New Image. Locate a satellite image for an entirely new location and translate it to a Google map and consider the result compared to the actual image in Google maps.
  • More Training. Continue training the model for another 100 epochs and evaluate whether the additional training results in further improvements in image quality.
  • Image Augmentation. Use some minor image augmentation during training as described in the Pix2Pix paper and evaluate whether it results in better quality generated images.

If you explore any of these extensions, I’d love to know.
Post your findings in the comments below.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.




In this tutorial, you discovered how to develop a Pix2Pix generative adversarial network for image-to-image translation.

Specifically, you learned:

  • How to load and prepare the satellite image to Google maps image-to-image translation dataset.
  • How to develop a Pix2Pix model for translating satellite photographs to Google map images.
  • How to use the final Pix2Pix generator model to translate ad hoc satellite images.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Generative Adversarial Networks Today!

Generative Adversarial Networks with Python

Develop Your GAN Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Generative Adversarial Networks with Python

It provides self-study tutorials and end-to-end projects on:
DCGAN, conditional GANs, image translation, Pix2Pix, CycleGAN
and much more...

Finally Bring GAN Models to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

152 Responses to How to Develop a Pix2Pix GAN for Image-to-Image Translation

  1. Deepanshu SIngh August 2, 2019 at 7:33 am #

    Amazing tutorial. Detailed and clear explanation of concepts as well as the codes.

    Thanks & Regards

  2. Sean O'Connor August 3, 2019 at 10:38 am #

    From a digital signal processing viewpoint a weighted sum is an adjustable filter.
    Each layer in a conventional artificial neural network has n of those filters and the total compute is a brutal n squared fused multiply accumulates.
    A fast Fourier transform is a fixed (nonadjustable) bank of filters, where each filter picks out frequency/phase.
    There are other transforms that act as filter banks too, such as the fast Walsh Hadamard transform and these often require far less compute (eg. nlog(n)) than a filter bank of weighed sums.
    The question then is why not use an efficient transform based filter bank and adjust the nonlinear functions in a neural network by individually parameterizing them?
    Ie. change what you adjust:

    • Jason Brownlee August 4, 2019 at 6:21 am #

      Perhaps test your alternate approach and compare the results Sean?

  3. Villem Lassmann August 6, 2019 at 7:41 pm #

    It seems to me that the discriminator is not a 70×70 PatchGAN, since the 4th layer should not be there. With that layer it seems like the discirminator is a 142×142 PatchGAN. Please correct me if I am mistaken.

  4. Vasudevakrishna August 20, 2019 at 3:50 pm #

    Thanks for the tutorial.
    My question is in original paper they are giving the direction as configurable parameter.
    But in your implementation I am unable to see that one.
    How can do that for both direction.
    Please explain

    • Jason Brownlee August 21, 2019 at 6:34 am #

      I show how to translate images in both directions in the above tutorial.

  5. Alex August 29, 2019 at 11:52 pm #

    Many thanks for this amazing tutorial!

    PS “There are 1,097 images”… and then there are saves every 10970 steps, and 109700 steps overall

  6. Salman Sajd September 1, 2019 at 7:13 am #

    Thanks for an amazing tutorial
    How we use GAN for motion transfer or which type of GAN will best for Motion Transfer?

    • Jason Brownlee September 2, 2019 at 5:24 am #

      I don’t know off hand sorry, perhaps try a search on

  7. Lin September 6, 2019 at 9:52 pm #

    Hello, thanks for the great article.
    I have one question, but why you scale the image to [-1, 1] instead of [0, 1]?
    Does this make the model behave differently?

    • Jason Brownlee September 7, 2019 at 5:29 am #

      Because the generator generates pixels in that range, and the discriminator must “see” all images pixels in the same range.

      The choice of -1,1 for pixels is a gan hack:

      • Dang Tuan Hoang September 12, 2019 at 1:45 pm #

        Hi sir, is it possible to train this model with inputs and output of different sizes?
        For example, I have 3 image a,b,c with size 50x50x3. I want the model to generate c from a,b. First I append a and b to get d with size 50x100x3, then use d as input, c as output

        • Jason Brownlee September 12, 2019 at 1:49 pm #

          Yes, you can use different sized input and output, although the u-net will require modification.

          • Dang Tuan Hoang September 16, 2019 at 5:09 pm #

            Could you give me some more details about how do I need to modify U-net in my case ? I’m not very familiar with this texture

          • Jason Brownlee September 17, 2019 at 6:24 am #

            Sorry, I don’t have the capacity to prepare custom code for you.

            Perhaps experiment with adding/subtracting groups of layers to the decoder part of the model and see the effect on the image size?

          • Dang Tuan Hoang September 17, 2019 at 12:37 pm #

            I know you are very busy so I didn’t ask for custom code, I just need something to start with. Thank for the suggestion sir !

          • Jason Brownlee September 17, 2019 at 2:34 pm #

            Perhaps start with just the function that defines the model and try playing around with it.

  8. David September 27, 2019 at 9:17 am #

    Hi Jason,

    First of all, thank you very much for posting this tutorial, I learned a lot from it.

    I have a question.

    Do u think if I leave the picture resolution as it is rather than compressing them.
    The performance is gonna be better? As my pictures between translation is very minor.

    Thank you!


    • Jason Brownlee September 27, 2019 at 1:15 pm #

      Thanks, I’m happy that it helped.

      Interesting idea. You mean likely working with TIFF images or other loess-less formats?

      Probably not, but perhaps test it to confirm.

  9. Alice October 16, 2019 at 6:16 pm #


    How can I increase speed of training? It uses very small portion of gpu memory.

    • Jason Brownlee October 17, 2019 at 6:26 am #

      Some ideas:

      Use less data.
      Use a smaller model.
      Use a faster machine.

      • Alice October 17, 2019 at 8:33 am #

        I am using a machine with 8 gpus (8 X p4000) 🙂
        I mean, for example, while training on darknet, changing batch size directly affects gpu memory usage. But this codes use only 100 mb of each gpu. And batch size doesn’t affect it. So I need an adjustment just like on darknet so that I can use full capability of gpus.

  10. Samuel October 17, 2019 at 8:30 am #

    Swapping out the training data for the SEN1-2 dataset had amazing results. I can now translate Sentinel 1 images to RGB Sentinel 2! Many thanks for such a thorough tutorial.

    • Jason Brownlee October 17, 2019 at 1:49 pm #

      Well done!

      I would love to see an example of a translated image.

  11. Mohammad October 20, 2019 at 12:03 pm #

    Hi Jason,

    Thank you so much for your great website, it is fantastic.

    I was wondering what your opinion is about the future research direction for this area of research?


    • Jason Brownlee October 21, 2019 at 6:14 am #


      Sorry, I don’t have thoughts on research directions – I try to stay focused on the industrial side these days.

  12. Michael October 28, 2019 at 9:08 am #

    Awesome tutorial on Pix2Pix. Your other GAN articles were great too and very helpful. After reading your tutorials, I was able to implement my own Pix2Pix project. All the code is on my GItHub.

  13. sss November 17, 2019 at 10:43 pm #

    – python version: 3.6.7
    – tensorflow-gpu version: 2.0.0
    – keras version: 2.3.1
    – cuDNN version:10.0
    – CUDA version:10.0 ( works perfectly but code which is given below gives me this error:

    tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
    [[node conv2d_7/convolution (defined at C:\Users\ACSECKIN\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\framework\ ]] [Op:__inference_keras_scratch_graph_4815]

    Function call stack:

    • Jason Brownlee November 18, 2019 at 6:45 am #

      Perhaps try running on the CPU first to confirm your environment is working?

      • sss November 18, 2019 at 6:04 pm #

        Code works with CPU. But I want to run in the GPU to complete in less time. CPU time is about 16 hours. I am open to any alternative to decrease the training time.

        • Jason Brownlee November 19, 2019 at 7:38 am #

          That is odd. I ran the examples on GPU without incident, specifically on EC2.

          Perhaps try that?

          • Jack November 27, 2019 at 6:12 pm #

            Based on my experience, I receive this error message when my GPU does not have enough memory to handle the process. Maybe try reducing the computational workload by using a smaller image? If not, use CPU if you are fine with it.

          • Jason Brownlee November 28, 2019 at 6:33 am #

            Great suggestions!

  14. Ivan November 23, 2019 at 9:05 pm #

    Hi guys,

    It takes 8 hours to train the model on GPU (Floydhub).
    But several different models were saved in the process.

    Can you explain why?

    • Jason Brownlee November 24, 2019 at 9:17 am #

      Perhaps try training on a large ec2 instance, it is much faster.

      Models are saved periodically during training, we cannot know what a good model will be without using it to generate images.

  15. Jack November 26, 2019 at 2:07 pm #

    Thank you for your awesome post, it is really detailed and helpful! I have a question about normalizing between [0, 255] to [-1, 1]. My images are single channel and the maximum and minimum pixel values vary for each image, from 0 to around 3-4 (depends on the image). How should I go about normalizing the images? Should I take the maximum of the whole batch of samples and normalize, or should I take the maximum for each sample and normalize each individually?

    Also, when translating new images, what would be the values of image? Would it be between -1 to 1? If yes, how should I “denormalize” the values to the original? Thank you for your help!

    • Jason Brownlee November 26, 2019 at 4:10 pm #


      I recommend selecting a min and max that you expect to see for all time and use that to scale the data. If not known or it cannot be known, use the limits of the domain (e.g. 0 and 255).

      • Jack November 27, 2019 at 6:11 pm #

        Thank you for your suggestions! How would you suggest I “de-normalize” the data during testing? Should I use the same range (I am taking the range from the training data) and reverse the process on the test data?

  16. Syd Rawat December 1, 2019 at 2:50 pm #

    Hi Jason,

    Thank you very much for sharing such an in-depth analysis of Pix2Pix GAN. It is really helpful for early career researchers like me who don’t have a CS background. I thought of applying this fro solving and inverse problems in Digital Holographic Microscopy and I am now intrigued by the preliminary results I have got. As you know, the output of the model is a translated image, hence it is not possible to calculate the model accuracy. I am looking for an image quality metric such as SSIM. Do you have any suggestions?

    Thank You,

    PS: As this post helped me enormously, I would like to cite your works on GANs in the future.

  17. Anthony December 15, 2019 at 1:24 am #

    Hi! Currently implementing this with images with shape (256, 512, 3) and keep running to an error as follows:

    “ValueError: A target array with shape (1, 16, 16, 1) was passed for an output of shape (None, 16, 32, 1) while using as loss binary_crossentropy. This loss expects targets to have the same shape as the output.”

    I assume that this is due to the downsampling? Any help would be appreciated

    • Jason Brownlee December 15, 2019 at 6:08 am #

      Perhaps start with square images, get it working, then try changing to rectangular images?

      • Anthony Mäkelä December 15, 2019 at 6:21 am #

        Hmm, alright! Could you explain why you use target_size=(256, 512) instead of (256, 256)?

        • Jason Brownlee December 16, 2019 at 6:03 am #

          The images are 256×512 – as they contain the input and output images together.

          We load them and split them into two separate square images 256×256 when working with the GAN.

  18. MK December 16, 2019 at 11:45 pm #

    The discriminator error seems to be going to zero pretty quick, any tips to avoid this?

    • Jason Brownlee December 17, 2019 at 6:36 am #

      Perhaps try running the example a few times and continue running – it may recover.

  19. Avdudia December 17, 2019 at 4:58 am #

    Thank you for this tutorial and simple code. I used it to perform image-to-image translation from Köppen–Geiger climate classification map ( ) to real satellite data, with truly amazing results, but I have a question.

    In my strategy I create near one thousand pairs of 256×256 tiles from the Köppen–Geiger map (present in the Wikipedia article above), and a high-resolution satellite map of the Earth. In order to minimize deformation on tiles pairs near poles I use orthographic projection. This gives me nice pairs of image for GAN training (see ).

    I trained the GAN until the end (n_epochs=100) with amazing results. Using training data give truly convincing satellite map validation ( Even with hand-painted or with source image converted from a random image into Köppen–Geiger colormap, results are very nice (

    However I noticed that the result lacked of “relief” effect. Moreover, on large landmasses where the climate does not change but the topography noticeably affects the satellite view (e.g Tibetan Plateau or the Grand Canyon), the model results in “flat” satellite views.

    As the climate map is composed of only 29 different indexed colors (plus the one I added for oceans), a simple label-to-image translation could be used, instead of using a full RGB climate image as input.

    So my idea was to store a heightmap of the earth on the first channel of the input image, and the normalized indexed climate color on the second channel. The third channel is kept unused. It results in a Red-Green image where the Red channel is the heightmap and the Green channel is normalized climate index (see
    The problem is that training with this input images give bad results compared to my first try (only climate date). Results were already convincing after 30 epochs in my my first try, with smooth transition between climates, why here the boundaries are clearly visible in generated images (see ).
    I tried to run the training several times to ensure that it was not purely bad luck, with the same result.

    I don’t understand because climate index can clearly be stored on one channel without information loss, and heightmap provides additional data, so it should improve the results. Is it simply because it needs more epochs ?

    Thank you in advance and sorry for the long post and for my english, it is not my native language.

    • Jason Brownlee December 17, 2019 at 6:42 am #

      Well done!

      Very cool application.

      Two thoughts off the cuff. One would be to make image synthesis conditional on two input images (source and the height map). A second would be to have 2 steps – one step to generate the initial translation and a second to improve the generated image with a height map.

      I’m eager to hear how you go!

      • Avdudia December 18, 2019 at 7:32 am #

        Thank you very much ! The aim is to develop a tool for worlbuidling and create realistic maps of imaginary planets (following Atrifexian’s tutorials ).

        I used the idea of using R and G channels for heightmap and climate following this thread concerning the pytorch implementation : They recommend to concatenate the input images, but it seems that your code is limited to 3 channels and as I’m a complete beginner I still don’t know how to use more than one image as input.

        However it seems indeed that training on more epochs actually gives good results with my method. Maybe 100 is not enough, so I restarted it with a limit of 1000 epochs. However I have to redo the first 100, an I run the code on Google Colab which seems to be very unstable (I only managed to reach 100 epochs twice).

        Do you have a tutorial on how to make complete checkpoints in order to continue the training in case of crash ? If I understand well, your summarize_performance function only save the generator model, so we should have to save the whole gan_model and reload it for later training. Do you have documentation or examples concerning this ?

        Thank you so much for your tutorial. I’ll keep you informed on later developments !

        • Jason Brownlee December 18, 2019 at 1:28 pm #

          Yes, the above code already saves the model every n steps.

          See the summarize_performance() function.

  20. JS December 18, 2019 at 2:06 am #

    Should i approach this the same way if i have images containing white backgrounds? Similar to the edges2shoes dataset?

    • Jason Brownlee December 18, 2019 at 6:10 am #

      Perhaps test it with a prototype?

      • JS December 18, 2019 at 8:01 pm #

        For some reason i end up only with blank white images…

  21. Sirine December 25, 2019 at 5:56 am #

    Thanks for the wonderful tutorial, Please how can I adapt the generator and the descriminator in order to make a transition from matrix (2,64) into matrix(2,64)

  22. Elbek Khoshimjonov January 2, 2020 at 7:59 pm #

    Thanks Jason for great post!
    I have tried this code, but images do not appear to be good enough, and discriminator loss becomes 0 after 10-15 epochs.

    • Jason Brownlee January 3, 2020 at 7:27 am #

      Perhaps try using a different final model or re-fit the model?

  23. Arsal January 28, 2020 at 10:55 pm #

    I want to continue training from the last checkpoint stored. Can you help me in resuming the training of a model from last checkpoint.

    • Jason Brownlee January 29, 2020 at 6:36 am #

      Yes, load the model as per normal and call the train() function.

  24. Arsal January 28, 2020 at 10:57 pm #

    I have a dataset consisting of 216 images. I trained for 100 epochs but unfortunately the results are not good. Can you help me how can I improve the results?

  25. Laura January 30, 2020 at 3:19 am #

    Thank you for this wonderful tutorial! It has been extremely helpful. I was wondering if you had considered data augmentation?

  26. Laura January 30, 2020 at 3:43 am #


    This is probably a newbie question, but I am new to GANs. In my limited experience with deep CNNs, I used the validation data during the training process to sort of evaluate how well it was “learning”> I then had another dataset I called the “test” dataset that I used after the training process was complete. Here it seems like you don’t use any validation during the training process. And what you call validation is what I call the test dataset. Is that something unique to GANs or can validation be included in the training process?

  27. Michael February 1, 2020 at 12:34 am #

    Hi! Im running this on Titan V and it seems to be running extremely slow. Any ideas as to why?

    • Jason Brownlee February 1, 2020 at 5:56 am #

      Training GANs is slow…

      Perhaps check you have enough RAM and GPU RAM?
      Perhaps compare results on an ec2 with a lot of RAM?
      Perhaps adjust model config to use less RAM?

  28. pramod kumar February 9, 2020 at 2:42 am #

    sir can i know how you downloaded the images of satellite and maps can you please help me to download my own dataset for this project

    • Jason Brownlee February 9, 2020 at 6:24 am #

      See the section of the above tutorial “Satellite to Map Image Translation Dataset” on how to download the dataset.

  29. Vaibhav Vijay kotwal February 24, 2020 at 6:28 pm #

    Hi Jason,
    From the theory, we understand that dicriminator learns the objective loss function.However referring to define_GAN(),line 56 in the code, I am not able to see the object loss learnt by discriminator getting passed to GAN model. I see that the model doesnot converge as expected

    Thanks and Regards,
    Vaibhav Kotwal

  30. Fırat Erdem February 26, 2020 at 10:49 pm #

    Thanks for the great tutorial. I need help with something. I want to see accuracy metrics for both train dataset and test dataset throughout the training process. And I want to see this for each epoch, not for each steps. Like a standart CNN model training procedure. How can add this things to code ? I couldn’t apply it because it is different than standard CNN codes. I would really appreciate it if you answer. Thank you.

  31. Tom March 8, 2020 at 5:13 am #

    Can pix2pix Gan save and load againt without training again

  32. Ehsan Karimi March 12, 2020 at 2:10 am #

    Hi Jason,
    Thanks for the great tutorial. I have a problem with image scales. In first step, after splitting the input images, I check the image size, instead of of 256*256 pixel they are 134*139 with background. Also, at translation a given standalone image using by model step, the output should be 256*256 same as input, but I get 252*262 output again with background.
    I was wondering if you would mind letting me know where is the problem?
    Thanks in advance

  33. Paolo March 13, 2020 at 8:13 am #

    Great work Jason. Just one question: do you believe that this approach could work using a RGB satellite image against its mask image, to make some kind of image segmentation ?
    Thanks in advance

    • Jason Brownlee March 13, 2020 at 8:24 am #

      Perhaps try it. Prototypes are a fast way to get answers.

      • Paolo March 13, 2020 at 8:31 am #

        I mean, the mask image would have just 2 colors (yes/not) … this was my concern. Thanks

  34. Iqbal March 19, 2020 at 9:05 pm #

    It is an interesting article publish here. I am new using this, i want some question for the first script for clear explanation :

    1. I saw the loaded data is maps in train and test folder. I want to know which 3 sample was loaded from the folder train? because the results was : (1096, 256, 256, 3) (1096, 256, 256, 3). I understand 1096 is the certain amount of image in that folder. and 256 I still dont understand because when I open picture 256 is not the same as the it was loaded.

    2. I saw the folder contain image in train and test. I want to ask the train, example 1.jpg it contains two image from satellite. May I know how to develop the left picture and the right picture or it develop itself? Also in test does it develop itself or have to save it first?

    Need some explanation for preparing using it in the future. Thank you

  35. yasser March 22, 2020 at 10:12 am #

    hi thank you for your work !
    I need your help,I need the same model but the input of the generator is one channel and not three .
    I have tried to change it but it does’nt work .thank you

    • Jason Brownlee March 23, 2020 at 6:09 am #

      Sorry to hear that.

      Perhaps confirm that your images are grayscale (1 channel), then change the model to expect 1 channel via the input shape.

  36. runnergirl March 23, 2020 at 5:53 am #

    Hi! I wanted to train myself. I prepared them just like in this tutorial. Size X1 and X2 are the same. Data display works. But I get this error:

    ‘Got inputs shapes: %s’ % (input_shape))
    ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 2, 2, 512), (None, 1, 1, 512)]

    What have I done wrong?

  37. ouis March 24, 2020 at 1:04 am #

    hello I have this problem I dont know why :
    ValueError: Graph disconnected: cannot obtain value for tensor Tensor(“input_14:0”, shape=(?, 256, 256, 3), dtype=float32) at layer “input_14”. The following previous layers were accessed without issue: [‘input_15’]

    • Jason Brownlee March 24, 2020 at 6:05 am #

      Perhaps confirm that your keras and tensorflow versions are updated?

  38. Harshit April 19, 2020 at 2:37 am #

    Hi. I tried to use a different dataset using this code. Specifically the edges2shoes dataset but i was not able to convert it into npz file. Everytime i ran into memory error. My ram is 16GB still that was not enough. I managed to create multiple npz files though. How should i proceed?

    Also could you be kind enough to make tutorial of Tensorflow/Keras of pix2pixHD since it is much more accurate and better in side by side tests compared to normal pix2pix.

  39. Phil April 30, 2020 at 11:23 pm #

    Hi Jason. Great Article. Good explanation. Your articles gave me a good overview and starting point when I started developing my own networks. But I have two questions:
    1:) From what I can see the original code on Github seems to be slightly different to your code in this article when it comes to how you connect an encoder and a decoder layer. On Github data from an encoder layer passed to a decoder layer (via skip-connection) is unactivated, meaning that the data is passed directly after the convolution(or batch norm/dropout), in contrary to the solution here. Is this a mistake or variation ?
    2.) What does the flag ‘training=True’ do when calling batch normalization layer or dropout layer ?
    Thanks in advance.

  40. ElenaRR May 3, 2020 at 4:08 am #

    Hi Jason,
    thank you very much for this tutorial, it’s awesome!
    I have the problem that you mentioned at the end of the article. D1 loss goes to zero after 80-90 steps. Could you explain me why this happens and how can I solve it?

    In addition to this, I can see that only one image is used in every iteration (one real,one fake) where n_batch = 1. Shouln’t we use more than one pair of images to train in each step?

    # select a batch of real samples
    [X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch)

    Thank you very much!

  41. ElenaRR May 3, 2020 at 7:39 pm #

    I’ve seen that there are specific codes in your book about this. Is a more complete example for pixtopix? Or is it the same?

    • Jason Brownlee May 4, 2020 at 6:19 am #

      The examples of pix2pix in the book are based on this example.

  42. shami May 11, 2020 at 1:02 am #

    long live 100 years …. superb tutorial with clear explanation

  43. bobbyP May 14, 2020 at 2:29 pm #

    anyway to use a keras data loader/generator for on the fly image loading dsirng training? say for example your training size is really large and loading all at once would result in out of memory errors? thanks so much for all your tutorials, they are incredbile!

  44. cans May 20, 2020 at 6:12 am #

    Hi Sir,
    First of all i read your all tutorials. You are helping me more than my consultant. Thank you so much.
    I am new at Gans. Sorry for my quesitons. But i cant understand how i test this model?
    I will use validation set okey.
    After training model i wont give target, just give source image?I cant get it.
    Or only ı load model and train with validity set?
    Omg i cant explain myself. I hope u are understand me.

    • Jason Brownlee May 20, 2020 at 6:29 am #

      You’re welcome!

      Evaluating GANs is challenging. We do not use a validation set. Instead we generate images and look at them and see if they are good enough.

  45. cans May 21, 2020 at 2:39 am #

    thank you for your answer Sir,
    i wanna try pix2pix gans for image enhancement.
    I’l use source images are low contrast, targets are high contrast ,what do you think? i’m trying to improve thermal image.
    I hope this system will work.

  46. Gruhit Patel May 25, 2020 at 1:44 pm #

    Here in Discriminator what is the need for Concatenating Source and Target images ? What effect would it have ?

    • Jason Brownlee May 26, 2020 at 6:12 am #

      Sorry, which section are you referring to? Where (which section/line?) do we concatenate images?

      • Gruhit Patel May 27, 2020 at 2:48 pm #

        In discriminator. where we are concatenating Source Image and target Image.
        Actually I’m building a GAN model color transformation from Gray to RGB.
        My discriminator and Generator model’s loss falls to zero. So wanted to know that what particular effect does the Concatenation have for discriminator. And if you have any advice for my model than tell it too.
        THANKS in advance…

        • Jason Brownlee May 28, 2020 at 6:09 am #

          Here we are training a conditional model, e.g. generate a target image conditional on the source image.

          E.g. it is the purpose of the model.

  47. Ujjayant Sinha May 27, 2020 at 12:54 am #

    Hello. I compared the images from the summarize_performance() to predictions on unseen ones, which turned out to be quite horrible. Can you suggest some ways to tackle this problem ?

    • Jason Brownlee May 27, 2020 at 7:58 am #

      Try training the model a few times, save many times during each run, choose a model that generates good images.

  48. Zsolt Lipcsei May 27, 2020 at 8:50 am #


    How can I use the generator model to predict any size of images? I mean not just squres sizes. Is there any way at all?

    • Jason Brownlee May 27, 2020 at 1:26 pm #

      You will have to change the generator/discriminator and also the training dataset to the desired size.

  49. Muhammad Ammar Malik June 12, 2020 at 2:49 am #

    Thank you for the awesome post. I have 2 questions if you can answer please.

    First question, you have mentioned:
    “In this case, we will use the model saved at the end of the run, e.g. after 10 epochs or 109,600 training iterations.”

    Shouldn’t the training iterations be 10,960 after 10 epochs.

    Second question, what is the rationale behind using random index to generate real and fake samples? Why can’t we simply iterate over all the samples one by one to make sure no image is missed or used more than once in 1 training step?

    • Jason Brownlee June 12, 2020 at 6:18 am #

      Images are generated after every 10 epochs, it runs for 100 epochs, meaning we save 10 models along the way. Yes, that is a typo, we used the model after 100 epochs. Fixed.

      We can do it for all image, I wanted to work with one image, to show we can use the model ad hoc. Readers often find that step confusing so I must demonstrate it.

  50. Riccardo June 18, 2020 at 11:27 pm #

    thank you for the great tutorial, you helped me a lot!

    i have just a question: am i doing something wrong or is it normal that for a X input i do not have a unique Y output.
    Let me explain better: if i repeat n times the prediction i get n different Y images (i’m checking pixels differences).
    I’m translating this this example to another application and having the exact same output everytime will make it works.

    I tried to look for a random noise vector or something like that but it seems that this is not the case.

  51. Riccardo June 18, 2020 at 11:29 pm #

    ” I’m translating this this example to another application and having the exact same output will make it works” *

    • Riccardo June 18, 2020 at 11:31 pm #

      this tutorial helped me a lot!

      i have just a question: is it normal that if i repeat n times the prediction, with the same input, i have n different outpus?

      I’m checking directly changes in pixel values in Y outputs.

      • Jason Brownlee June 19, 2020 at 6:15 am #

        Yes, this is expected given the stochastic nature of some of the layers.

        • Riccardo June 19, 2020 at 5:35 pm #

          thank you for the reply!
          I’m guessing that dropouts are introducing randomization, i’ll try without that.

          • Jason Brownlee June 20, 2020 at 6:08 am #

            Yes, also there are layers that inject noise directly.

          • Riccardo June 22, 2020 at 6:09 pm #

            i’m sorry if i seem annoying but i do not see layers that can introduce directly noise in the code you provided.

            I see convs, batchnorms and concatenates.
            Can you please tell me which layer is introducing noise directly?

            i think that i have missed something about these layers but reading through the documentation it seems like i know them pretty good.

            I really need to understand the position of these noise generator and remove them in order to use a GAN for my application (maybe it could be impossible but i wish to try =) )

          • Jason Brownlee June 23, 2020 at 6:16 am #

            Sorry, my mistake, I was thinking of a different GAN.

  52. Sahil Singla June 22, 2020 at 7:17 am #

    Thanks for the great tutorial.

    I have one small doubt:
    Do we traverse over the complete dataset?
    We passed our entire dataset to the generate_real_samples function and everytime it chooses a random number, which could be same, if we traverse again and again.

    So, we might not be traversing over the complete dataset in single epoch?

    Please let me know your thoughts.


    • Jason Brownlee June 22, 2020 at 1:25 pm #

      You’re welcome.

      Correct. On average we cover the whole dataset many times.

      • Sahil Singla June 22, 2020 at 6:20 pm #

        So, there is a possibility of missing certain datapoints. This can become a problem if you have very less data points to work with.

        So should I change the code to make sure, it traverse over entire data, or is it still ok, if we don’t do that ?

        • Jason Brownlee June 23, 2020 at 6:17 am #

          If you prefer. I’m not convinced it makes a difference, but could be a fun experiment.

    • Riccardo June 23, 2020 at 6:08 pm #

      oh, ok no problem!
      i think that i will investigate stochasticity trought the different convs and batch norm in order to make the net able to predict the same Y from an X input.

      best regards

  53. yacine June 30, 2020 at 7:30 pm #

    Thank you so much for this super clear explanation and code.

  54. Steve Newbold July 3, 2020 at 11:59 pm #

    If I wanted to use an input with three colour channels and a target of four colour channels, can this be configured or is it best to just create an additional black 4th channel on the input?

    I noticed some greyscale-to-colour models just use the same data in each channel to represent grey images so presumed it mush be easier to do this than make the model work with differing numbers of channels.

    Also, thanks for the excellent resource!

    • Jason Brownlee July 4, 2020 at 6:01 am #

      Off the cuff I recall the images have to have the same number of channels. Perhaps experiment/research and see if you can deviate from this norm.

  55. M July 8, 2020 at 10:42 am #

    Thanks for this great post!
    For your generator’s loss, how can I know if are you minimizing 1: log(1 – D(G(x))) or maximizing 2: log D(G(x))?
    How can one change the loss function, any reading suggestions?
    Some people say the choice of generator’s loss can help the model to not get stuck in early stages of training.

Leave a Reply