[New Book] Click to get Mastering Digital Art with Stable Diffusion!
Use the offer code 20offearlybird to get 20% off. Hurry, sale ends soon!

How to Develop a Pix2Pix GAN for Image-to-Image Translation

The Pix2Pix Generative Adversarial Network, or GAN, is an approach to training a deep convolutional neural network for image-to-image translation tasks.

The careful configuration of architecture as a type of image-conditional GAN allows for both the generation of large images compared to prior GAN models (e.g. such as 256×256 pixels) and the capability of performing well on a variety of different image-to-image translation tasks.

In this tutorial, you will discover how to develop a Pix2Pix generative adversarial network for image-to-image translation.

After completing this tutorial, you will know:

  • How to load and prepare the satellite image to Google maps image-to-image translation dataset.
  • How to develop a Pix2Pix model for translating satellite photographs to Google map images.
  • How to use the final Pix2Pix generator model to translate ad hoc satellite images.

Kick-start your project with my new book Generative Adversarial Networks with Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Updated Jan/2021: Updated so layer freezing works with batch norm.
How to Develop a Pix2Pix Generative Adversarial Network for Image-to-Image Translation

How to Develop a Pix2Pix Generative Adversarial Network for Image-to-Image Translation
Photo by European Southern Observatory, some rights reserved.

Tutorial Overview

This tutorial is divided into five parts; they are:

  1. What Is the Pix2Pix GAN?
  2. Satellite to Map Image Translation Dataset
  3. How to Develop and Train a Pix2Pix Model
  4. How to Translate Images With a Pix2Pix Model
  5. How to Translate Google Maps to Satellite Images

What Is the Pix2Pix GAN?

Pix2Pix is a Generative Adversarial Network, or GAN, model designed for general purpose image-to-image translation.

The approach was presented by Phillip Isola, et al. in their 2016 paper titled “Image-to-Image Translation with Conditional Adversarial Networks” and presented at CVPR in 2017.

The GAN architecture is comprised of a generator model for outputting new plausible synthetic images, and a discriminator model that classifies images as real (from the dataset) or fake (generated). The discriminator model is updated directly, whereas the generator model is updated via the discriminator model. As such, the two models are trained simultaneously in an adversarial process where the generator seeks to better fool the discriminator and the discriminator seeks to better identify the counterfeit images.

The Pix2Pix model is a type of conditional GAN, or cGAN, where the generation of the output image is conditional on an input, in this case, a source image. The discriminator is provided both with a source image and the target image and must determine whether the target is a plausible transformation of the source image.

The generator is trained via adversarial loss, which encourages the generator to generate plausible images in the target domain. The generator is also updated via L1 loss measured between the generated image and the expected output image. This additional loss encourages the generator model to create plausible translations of the source image.

The Pix2Pix GAN has been demonstrated on a range of image-to-image translation tasks such as converting maps to satellite photographs, black and white photographs to color, and sketches of products to product photographs.

Now that we are familiar with the Pix2Pix GAN, let’s prepare a dataset that we can use with image-to-image translation.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Satellite to Map Image Translation Dataset

In this tutorial, we will use the so-called “maps” dataset used in the Pix2Pix paper.

This is a dataset comprised of satellite images of New York and their corresponding Google maps pages. The image translation problem involves converting satellite photos to Google maps format, or the reverse, Google maps images to Satellite photos.

The dataset is provided on the pix2pix website and can be downloaded as a 255-megabyte zip file.

Download the dataset and unzip it into your current working directory. This will create a directory called “maps” with the following structure:

The train folder contains 1,097 images, whereas the validation dataset contains 1,099 images.

Images have a digit filename and are in JPEG format. Each image is 1,200 pixels wide and 600 pixels tall and contains both the satellite image on the left and the Google maps image on the right.

Sample Image From the Maps Dataset Including Both Satellite and Google Maps Image.

Sample Image From the Maps Dataset Including Both Satellite and Google Maps Image.

We can prepare this dataset for training a Pix2Pix GAN model in Keras. We will just work with the images in the training dataset. Each image will be loaded, rescaled, and split into the satellite and Google map elements. The result will be 1,097 color image pairs with the width and height of 256×256 pixels.

The load_images() function below implements this. It enumerates the list of images in a given directory, loads each with the target size of 256×512 pixels, splits each image into satellite and map elements and returns an array of each.

We can call this function with the path to the training dataset. Once loaded, we can save the prepared arrays to a new file in compressed format for later use.

The complete example is listed below.

Running the example loads all images in the training dataset, summarizes their shape to ensure the images were loaded correctly, then saves the arrays to a new file called maps_256.npz in compressed NumPy array format.

This file can be loaded later via the load() NumPy function and retrieving each array in turn.

We can then plot some images pairs to confirm the data has been handled correctly.

Running this example loads the prepared dataset and summarizes the shape of each array, confirming our expectations of a little over one thousand 256×256 image pairs.

A plot of three image pairs is also created showing the satellite images on the top and Google map images on the bottom.

We can see that satellite images are quite complex and that although the Google map images are much simpler, they have color codings for things like major roads, water, and parks.

Plot of Three Image Pairs Showing Satellite Images (top) and Google Map Images (bottom).

Plot of Three Image Pairs Showing Satellite Images (top) and Google Map Images (bottom).

Now that we have prepared the dataset for image translation, we can develop our Pix2Pix GAN model.

How to Develop and Train a Pix2Pix Model

In this section, we will develop the Pix2Pix model for translating satellite photos to Google maps images.

The same model architecture and configuration described in the paper was used across a range of image translation tasks. This architecture is both described in the body of the paper, with additional detail in the appendix of the paper, and a fully working implementation provided as open source with the Torch deep learning framework.

The implementation in this section will use the Keras deep learning framework based directly on the model described in the paper and implemented in the author’s code base, designed to take and generate color images with the size 256×256 pixels.

The architecture is comprised of two models: the discriminator and the generator.

The discriminator is a deep convolutional neural network that performs image classification. Specifically, conditional-image classification. It takes both the source image (e.g. satellite photo) and the target image (e.g. Google maps image) as input and predicts the likelihood of whether target image is real or a fake translation of the source image.

The discriminator design is based on the effective receptive field of the model, which defines the relationship between one output of the model to the number of pixels in the input image. This is called a PatchGAN model and is carefully designed so that each output prediction of the model maps to a 70×70 square or patch of the input image. The benefit of this approach is that the same model can be applied to input images of different sizes, e.g. larger or smaller than 256×256 pixels.

The output of the model depends on the size of the input image but may be one value or a square activation map of values. Each value is a probability for the likelihood that a patch in the input image is real. These values can be averaged to give an overall likelihood or classification score if needed.

The define_discriminator() function below implements the 70×70 PatchGAN discriminator model as per the design of the model in the paper. The model takes two input images that are concatenated together and predicts a patch output of predictions. The model is optimized using binary cross entropy, and a weighting is used so that updates to the model have half (0.5) the usual effect. The authors of Pix2Pix recommend this weighting of model updates to slow down changes to the discriminator, relative to the generator model during training.

The generator model is more complex than the discriminator model.

The generator is an encoder-decoder model using a U-Net architecture. The model takes a source image (e.g. satellite photo) and generates a target image (e.g. Google maps image). It does this by first downsampling or encoding the input image down to a bottleneck layer, then upsampling or decoding the bottleneck representation to the size of the output image. The U-Net architecture means that skip-connections are added between the encoding layers and the corresponding decoding layers, forming a U-shape.

The image below makes the skip-connections clear, showing how the first layer of the encoder is connected to the last layer of the decoder, and so on.

Architecture of the U-Net Generator Model

Architecture of the U-Net Generator Model
Taken from Image-to-Image Translation With Conditional Adversarial Networks

The encoder and decoder of the generator are comprised of standardized blocks of convolutional, batch normalization, dropout, and activation layers. This standardization means that we can develop helper functions to create each block of layers and call it repeatedly to build-up the encoder and decoder parts of the model.

The define_generator() function below implements the U-Net encoder-decoder generator model. It uses the define_encoder_block() helper function to create blocks of layers for the encoder and the decoder_block() function to create blocks of layers for the decoder. The tanh activation function is used in the output layer, meaning that pixel values in the generated image will be in the range [-1,1].

The discriminator model is trained directly on real and generated images, whereas the generator model is not.

Instead, the generator model is trained via the discriminator model. It is updated to minimize the loss predicted by the discriminator for generated images marked as “real.” As such, it is encouraged to generate more real images. The generator is also updated to minimize the L1 loss or mean absolute error between the generated image and the target image.

The generator is updated via a weighted sum of both the adversarial loss and the L1 loss, where the authors of the model recommend a weighting of 100 to 1 in favor of the L1 loss. This is to encourage the generator strongly toward generating plausible translations of the input image, and not just plausible images in the target domain.

This can be achieved by defining a new logical model comprised of the weights in the existing standalone generator and discriminator model. This logical or composite model involves stacking the generator on top of the discriminator. A source image is provided as input to the generator and to the discriminator, although the output of the generator is connected to the discriminator as the corresponding “target” image. The discriminator then predicts the likelihood that the generator was a real translation of the source image.

The discriminator is updated in a standalone manner, so the weights are reused in this composite model but are marked as not trainable. The composite model is updated with two targets, one indicating that the generated images were real (cross entropy loss), forcing large weight updates in the generator toward generating more realistic images, and the executed real translation of the image, which is compared against the output of the generator model (L1 loss).

The define_gan() function below implements this, taking the already-defined generator and discriminator models as arguments and using the Keras functional API to connect them together into a composite model. Both loss functions are specified for the two outputs of the model and the weights used for each are specified in the loss_weights argument to the compile() function.

Next, we can load our paired images dataset in compressed NumPy array format.

This will return a list of two NumPy arrays: the first for source images and the second for corresponding target images.

Training the discriminator will require batches of real and fake images.

The generate_real_samples() function below will prepare a batch of random pairs of images from the training dataset, and the corresponding discriminator label of class=1 to indicate they are real.

The generate_fake_samples() function below uses the generator model and a batch of real source images to generate an equivalent batch of target images for the discriminator.

These are returned with the label class-0 to indicate to the discriminator that they are fake.

Typically, GAN models do not converge; instead, an equilibrium is found between the generator and discriminator models. As such, we cannot easily judge when training should stop. Therefore, we can save the model and use it to generate sample image-to-image translations periodically during training, such as every 10 training epochs.

We can then review the generated images at the end of training and use the image quality to choose a final model.

The summarize_performance() function implements this, taking the generator model at a point during training and using it to generate a number, in this case three, of translations of randomly selected images in the dataset. The source, generated image, and expected target are then plotted as three rows of images and the plot saved to file. Additionally, the model is saved to an H5 formatted file that makes it easier to load later.

Both the image and model filenames include the training iteration number, allowing us to easily tell them apart at the end of training.

Finally, we can train the generator and discriminator models.

The train() function below implements this, taking the defined generator, discriminator, composite model, and loaded dataset as input. The number of epochs is set at 100 to keep training times down, although 200 was used in the paper. A batch size of 1 is used as is recommended in the paper.

Training involves a fixed number of training iterations. There are 1,097 images in the training dataset. One epoch is one iteration through this number of examples, with a batch size of one means 1,097 training steps. The generator is saved and evaluated every 10 epochs or every 10,970 training steps, and the model will run for 100 epochs, or a total of 109,700 training steps.

Each training step involves first selecting a batch of real examples, then using the generator to generate a batch of matching fake samples using the real source images. The discriminator is then updated with the batch of real images and then fake images.

Next, the generator model is updated providing the real source images as input and providing class labels of 1 (real) and the real target images as the expected outputs of the model required for calculating loss. The generator has two loss scores as well as the weighted sum score returned from the call to train_on_batch(). We are only interested in the weighted sum score (the first value returned) as it is used to update the model weights.

Finally, the loss for each update is reported to the console each training iteration and model performance is evaluated every 10 training epochs.

Tying all of this together, the complete code example of training a Pix2Pix GAN to translate satellite photos to Google maps images is listed below.

The example can be run on CPU hardware, although GPU hardware is recommended.

The example might take about two hours to run on modern GPU hardware.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The loss is reported each training iteration, including the discriminator loss on real examples (d1), discriminator loss on generated or fake examples (d2), and generator loss, which is a weighted average of adversarial and L1 loss (g).

If loss for the discriminator goes to zero and stays there for a long time, consider re-starting the training run as it is an example of a training failure.

Models are saved every 10 epochs and saved to a file with the training iteration number. Additionally, images are generated every 10 epochs and compared to the expected target images. These plots can be assessed at the end of the run and used to select a final generator model based on generated image quality.

At the end of the run, will you will have 10 saved model files and 10 plots of generated images.

After the first 10 epochs, map images are generated that look plausible, although the lines for streets are not entirely straight and images contain some blurring. Nevertheless, large structures are in the right places with mostly the right colors.

Plot of Satellite to Google Map Translated Images Using Pix2Pix After 10 Training Epochs

Plot of Satellite to Google Map Translated Images Using Pix2Pix After 10 Training Epochs

Generated images after about 50 training epochs begin to look very realistic, at least to mean, and quality appears to remain good for the remainder of the training process.

Note the first generated image example below (right column, middle row) that includes more useful detail than the real Google map image.

Plot of Satellite to Google Map Translated Images Using Pix2Pix After 100 Training Epochs

Plot of Satellite to Google Map Translated Images Using Pix2Pix After 100 Training Epochs

Now that we have developed and trained the Pix2Pix model, we can explore how they can be used in a standalone manner.

How to Translate Images With a Pix2Pix Model

Training the Pix2Pix model results in many saved models and samples of generated images for each.

More training epochs does not necessarily mean a better quality model. Therefore, we can choose a model based on the quality of the generated images and use it to perform ad hoc image-to-image translation.

In this case, we will use the model saved at the end of the run, e.g. after 100 epochs or 109,600 training iterations.

A good starting point is to load the model and use it to make ad hoc translations of source images in the training dataset.

First, we can load the training dataset. We can use the same function named load_real_samples() for loading the dataset as was used when training the model.

This function can be called as follows:

Next, we can load the saved Keras model.

Next, we can choose a random image pair from the training dataset to use as an example.

We can provide the source satellite image as input to the model and use it to predict a Google map image.

Finally, we can plot the source, generated image, and the expected target image.

The plot_images() function below implements this, providing a nice title above each image.

This function can be called with each of our source, generated, and target images.

Tying all of this together, the complete example of performing an ad hoc image-to-image translation with an example from the training dataset is listed below.

Running the example will select a random image from the training dataset, translate it to a Google map, and plot the result compared to the expected image.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the generated image captures large roads with orange and yellow as well as green park areas. The generated image is not perfect but is very close to the expected image.

Plot of Satellite to Google Map Image Translation With Final Pix2Pix GAN Model

Plot of Satellite to Google Map Image Translation With Final Pix2Pix GAN Model

We may also want to use the model to translate a given standalone image.

We can select an image from the validation dataset under maps/val and crop the satellite element of the image. This can then be saved and used as input to the model.

In this case, we will use “maps/val/1.jpg“.

Example Image From the Validation Part of the Maps Dataset

Example Image From the Validation Part of the Maps Dataset

We can use an image program to create a rough crop of the satellite element of this image to use as input and save the file as satellite.jpg in the current working directory.

Example of a Cropped Satellite Image to Use as Input to the Pix2Pix Model.

Example of a Cropped Satellite Image to Use as Input to the Pix2Pix Model.

We must load the image as a NumPy array of pixels with the size of 256×256, rescale the pixel values to the range [-1,1], and then expand the single image dimensions to represent one input sample.

The load_image() function below implements this, returning image pixels that can be provided directly to a loaded Pix2Pix model.

We can then load our cropped satellite image.

As before, we can load our saved Pix2Pix generator model and generate a translation of the loaded image.

Finally, we can scale the pixel values back to the range [0,1] and plot the result.

Tying this all together, the complete example of performing an ad hoc image translation with a single image file is listed below.

Running the example loads the image from file, creates a translation of it, and plots the result.

The generated image appears to be a reasonable translation of the source image.

The streets do not appear to be straight lines and the detail of the buildings is a bit lacking. Perhaps with further training or choice of a different model, higher-quality images could be generated.

Plot of Satellite Image Translated to Google Maps With Final Pix2Pix GAN Model

Plot of Satellite Image Translated to Google Maps With Final Pix2Pix GAN Model

How to Translate Google Maps to Satellite Images

Now that we are familiar with how to develop and use a Pix2Pix model for translating satellite images to Google maps, we can also explore the reverse.

That is, we can develop a Pix2Pix model to translate Google map images to plausible satellite images. This requires that the model invent or hallucinate plausible buildings, roads, parks, and more.

We can use the same code to train the model with one small difference. We can change the order of the datasets returned from the load_real_samples() function; for example:

Note: the order of X1 and X2 is reversed.

This means that the model will take Google map images as input and learn to generate satellite images.

Run the example as before.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

As before, the loss of the model is reported each training iteration. If loss for the discriminator goes to zero and stays there for a long time, consider re-starting the training run as it is an example of a training failure.

It is harder to judge the quality of generated satellite images, nevertheless, plausible images are generated after just 10 epochs.

Plot of Google Map to Satellite Translated Images Using Pix2Pix After 10 Training Epochs

Plot of Google Map to Satellite Translated Images Using Pix2Pix After 10 Training Epochs

As before, image quality will improve and will continue to vary over the training process. A final model can be chosen based on generated image quality, not total training epochs.

The model appears to have little difficulty in generating reasonable water, parks, roads, and more.

Plot of Google Map to Satellite Translated Images Using Pix2Pix After 90 Training Epochs

Plot of Google Map to Satellite Translated Images Using Pix2Pix After 90 Training Epochs


This section lists some ideas for extending the tutorial that you may wish to explore.

  • Standalone Satellite. Develop an example of translating standalone Google map images to satellite images, as we did for satellite to Google map images.
  • New Image. Locate a satellite image for an entirely new location and translate it to a Google map and consider the result compared to the actual image in Google maps.
  • More Training. Continue training the model for another 100 epochs and evaluate whether the additional training results in further improvements in image quality.
  • Image Augmentation. Use some minor image augmentation during training as described in the Pix2Pix paper and evaluate whether it results in better quality generated images.

If you explore any of these extensions, I’d love to know.
Post your findings in the comments below.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.




In this tutorial, you discovered how to develop a Pix2Pix generative adversarial network for image-to-image translation.

Specifically, you learned:

  • How to load and prepare the satellite image to Google maps image-to-image translation dataset.
  • How to develop a Pix2Pix model for translating satellite photographs to Google map images.
  • How to use the final Pix2Pix generator model to translate ad hoc satellite images.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Develop Generative Adversarial Networks Today!

Generative Adversarial Networks with Python

Develop Your GAN Models in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Generative Adversarial Networks with Python

It provides self-study tutorials and end-to-end projects on:
DCGAN, conditional GANs, image translation, Pix2Pix, CycleGAN
and much more...

Finally Bring GAN Models to your Vision Projects

Skip the Academics. Just Results.

See What's Inside

365 Responses to How to Develop a Pix2Pix GAN for Image-to-Image Translation

  1. Avatar
    Deepanshu SIngh August 2, 2019 at 7:33 am #

    Amazing tutorial. Detailed and clear explanation of concepts as well as the codes.

    Thanks & Regards

  2. Avatar
    Sean O'Connor August 3, 2019 at 10:38 am #

    From a digital signal processing viewpoint a weighted sum is an adjustable filter.
    Each layer in a conventional artificial neural network has n of those filters and the total compute is a brutal n squared fused multiply accumulates.
    A fast Fourier transform is a fixed (nonadjustable) bank of filters, where each filter picks out frequency/phase.
    There are other transforms that act as filter banks too, such as the fast Walsh Hadamard transform and these often require far less compute (eg. nlog(n)) than a filter bank of weighed sums.
    The question then is why not use an efficient transform based filter bank and adjust the nonlinear functions in a neural network by individually parameterizing them?
    Ie. change what you adjust:

    • Avatar
      Jason Brownlee August 4, 2019 at 6:21 am #

      Perhaps test your alternate approach and compare the results Sean?

  3. Avatar
    Villem Lassmann August 6, 2019 at 7:41 pm #

    It seems to me that the discriminator is not a 70×70 PatchGAN, since the 4th layer should not be there. With that layer it seems like the discirminator is a 142×142 PatchGAN. Please correct me if I am mistaken.

    • Avatar
      Jason Brownlee August 7, 2019 at 7:50 am #

      I believe you are mistaken.

      You can learn more about the 70×70 patch gan in greater detail in this post:

      • Avatar
        Villem Lassmann August 7, 2019 at 8:17 pm #

        That example has the same structure, 6 layers of Conv2D (including the last one). But when looking at the beginning of the post where You are calculating the receptive field with 5 layers of Conv layers. The calculation also states that there are only 3 layers of Conv2D with a stride of 2. I believe that the layer named C512 should be the second to last layer.

        • Avatar
          Jason Brownlee August 8, 2019 at 6:40 am #

          I believe the implementation matches the official implementation described here:

          • Avatar
            Margie July 28, 2022 at 8:18 pm #

            Hi Jason, thanks for the great tutorials. I agreed with Villem that the current discriminator model is a 142×142 PatchGAN. For a 70x70PatchGAN, I think it should be only 3 layers with 4×4 kernel and 2×2 stride (remove the C512).

            If anyone else has the same confusion with me, please let me know. thanks:)

          • Avatar
            James Carmichael July 29, 2022 at 10:07 am #

            Thank you all for the feedback!

      • Avatar
        Hind AlDabagh January 14, 2022 at 2:22 pm #

        Sorry what’s the link?? This link is the same as the original one.

  4. Avatar
    Vasudevakrishna August 20, 2019 at 3:50 pm #

    Thanks for the tutorial.
    My question is in original paper they are giving the direction as configurable parameter.
    But in your implementation I am unable to see that one.
    How can do that for both direction.
    Please explain

    • Avatar
      Jason Brownlee August 21, 2019 at 6:34 am #

      I show how to translate images in both directions in the above tutorial.

  5. Avatar
    Alex August 29, 2019 at 11:52 pm #

    Many thanks for this amazing tutorial!

    PS “There are 1,097 images”… and then there are saves every 10970 steps, and 109700 steps overall

  6. Avatar
    Salman Sajd September 1, 2019 at 7:13 am #

    Thanks for an amazing tutorial
    How we use GAN for motion transfer or which type of GAN will best for Motion Transfer?

    • Avatar
      Jason Brownlee September 2, 2019 at 5:24 am #

      I don’t know off hand sorry, perhaps try a search on scholar.google.com

  7. Avatar
    Lin September 6, 2019 at 9:52 pm #

    Hello, thanks for the great article.
    I have one question, but why you scale the image to [-1, 1] instead of [0, 1]?
    Does this make the model behave differently?

    • Avatar
      Jason Brownlee September 7, 2019 at 5:29 am #

      Because the generator generates pixels in that range, and the discriminator must “see” all images pixels in the same range.

      The choice of -1,1 for pixels is a gan hack:

      • Avatar
        Dang Tuan Hoang September 12, 2019 at 1:45 pm #

        Hi sir, is it possible to train this model with inputs and output of different sizes?
        For example, I have 3 image a,b,c with size 50x50x3. I want the model to generate c from a,b. First I append a and b to get d with size 50x100x3, then use d as input, c as output

        • Avatar
          Jason Brownlee September 12, 2019 at 1:49 pm #

          Yes, you can use different sized input and output, although the u-net will require modification.

          • Avatar
            Dang Tuan Hoang September 16, 2019 at 5:09 pm #

            Could you give me some more details about how do I need to modify U-net in my case ? I’m not very familiar with this texture

          • Avatar
            Jason Brownlee September 17, 2019 at 6:24 am #

            Sorry, I don’t have the capacity to prepare custom code for you.

            Perhaps experiment with adding/subtracting groups of layers to the decoder part of the model and see the effect on the image size?

          • Avatar
            Dang Tuan Hoang September 17, 2019 at 12:37 pm #

            I know you are very busy so I didn’t ask for custom code, I just need something to start with. Thank for the suggestion sir !

          • Avatar
            Jason Brownlee September 17, 2019 at 2:34 pm #

            Perhaps start with just the function that defines the model and try playing around with it.

        • Avatar
          mrogozin September 25, 2021 at 10:53 pm #

          Did you find a solution for this? I am struggling with the same issue. Thanks

  8. Avatar
    David September 27, 2019 at 9:17 am #

    Hi Jason,

    First of all, thank you very much for posting this tutorial, I learned a lot from it.

    I have a question.

    Do u think if I leave the picture resolution as it is rather than compressing them.
    The performance is gonna be better? As my pictures between translation is very minor.

    Thank you!


    • Avatar
      Jason Brownlee September 27, 2019 at 1:15 pm #

      Thanks, I’m happy that it helped.

      Interesting idea. You mean likely working with TIFF images or other loess-less formats?

      Probably not, but perhaps test it to confirm.

  9. Avatar
    Alice October 16, 2019 at 6:16 pm #


    How can I increase speed of training? It uses very small portion of gpu memory.

    • Avatar
      Jason Brownlee October 17, 2019 at 6:26 am #

      Some ideas:

      Use less data.
      Use a smaller model.
      Use a faster machine.

      • Avatar
        Alice October 17, 2019 at 8:33 am #

        I am using a machine with 8 gpus (8 X p4000) 🙂
        I mean, for example, while training on darknet, changing batch size directly affects gpu memory usage. But this codes use only 100 mb of each gpu. And batch size doesn’t affect it. So I need an adjustment just like on darknet so that I can use full capability of gpus.

  10. Avatar
    Samuel October 17, 2019 at 8:30 am #

    Swapping out the training data for the SEN1-2 dataset had amazing results. I can now translate Sentinel 1 images to RGB Sentinel 2! Many thanks for such a thorough tutorial.

    • Avatar
      Jason Brownlee October 17, 2019 at 1:49 pm #

      Well done!

      I would love to see an example of a translated image.

  11. Avatar
    Mohammad October 20, 2019 at 12:03 pm #

    Hi Jason,

    Thank you so much for your great website, it is fantastic.

    I was wondering what your opinion is about the future research direction for this area of research?


    • Avatar
      Jason Brownlee October 21, 2019 at 6:14 am #


      Sorry, I don’t have thoughts on research directions – I try to stay focused on the industrial side these days.

  12. Avatar
    Michael October 28, 2019 at 9:08 am #

    Awesome tutorial on Pix2Pix. Your other GAN articles were great too and very helpful. After reading your tutorials, I was able to implement my own Pix2Pix project. All the code is on my GItHub. https://github.com/michaelnation26/pix2pix-edges-with-color

  13. Avatar
    sss November 17, 2019 at 10:43 pm #

    – python version: 3.6.7
    – tensorflow-gpu version: 2.0.0
    – keras version: 2.3.1
    – cuDNN version:10.0
    – CUDA version:10.0

    mnist_mlp.py (https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py) works perfectly but code which is given below gives me this error:

    tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
    [[node conv2d_7/convolution (defined at C:\Users\ACSECKIN\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_4815]

    Function call stack:

    • Avatar
      Jason Brownlee November 18, 2019 at 6:45 am #

      Perhaps try running on the CPU first to confirm your environment is working?

      • Avatar
        sss November 18, 2019 at 6:04 pm #

        Code works with CPU. But I want to run in the GPU to complete in less time. CPU time is about 16 hours. I am open to any alternative to decrease the training time.

        • Avatar
          Jason Brownlee November 19, 2019 at 7:38 am #

          That is odd. I ran the examples on GPU without incident, specifically on EC2.

          Perhaps try that?

          • Avatar
            Jack November 27, 2019 at 6:12 pm #

            Based on my experience, I receive this error message when my GPU does not have enough memory to handle the process. Maybe try reducing the computational workload by using a smaller image? If not, use CPU if you are fine with it.

          • Avatar
            Jason Brownlee November 28, 2019 at 6:33 am #

            Great suggestions!

  14. Avatar
    Ivan November 23, 2019 at 9:05 pm #

    Hi guys,

    It takes 8 hours to train the model on GPU (Floydhub).
    But several different models were saved in the process.

    Can you explain why?

    • Avatar
      Jason Brownlee November 24, 2019 at 9:17 am #

      Perhaps try training on a large ec2 instance, it is much faster.

      Models are saved periodically during training, we cannot know what a good model will be without using it to generate images.

  15. Avatar
    Jack November 26, 2019 at 2:07 pm #

    Thank you for your awesome post, it is really detailed and helpful! I have a question about normalizing between [0, 255] to [-1, 1]. My images are single channel and the maximum and minimum pixel values vary for each image, from 0 to around 3-4 (depends on the image). How should I go about normalizing the images? Should I take the maximum of the whole batch of samples and normalize, or should I take the maximum for each sample and normalize each individually?

    Also, when translating new images, what would be the values of image? Would it be between -1 to 1? If yes, how should I “denormalize” the values to the original? Thank you for your help!

    • Avatar
      Jason Brownlee November 26, 2019 at 4:10 pm #


      I recommend selecting a min and max that you expect to see for all time and use that to scale the data. If not known or it cannot be known, use the limits of the domain (e.g. 0 and 255).

      • Avatar
        Jack November 27, 2019 at 6:11 pm #

        Thank you for your suggestions! How would you suggest I “de-normalize” the data during testing? Should I use the same range (I am taking the range from the training data) and reverse the process on the test data?

  16. Avatar
    Syd Rawat December 1, 2019 at 2:50 pm #

    Hi Jason,

    Thank you very much for sharing such an in-depth analysis of Pix2Pix GAN. It is really helpful for early career researchers like me who don’t have a CS background. I thought of applying this fro solving and inverse problems in Digital Holographic Microscopy and I am now intrigued by the preliminary results I have got. As you know, the output of the model is a translated image, hence it is not possible to calculate the model accuracy. I am looking for an image quality metric such as SSIM. Do you have any suggestions?

    Thank You,

    PS: As this post helped me enormously, I would like to cite your works on GANs in the future.

  17. Avatar
    Anthony December 15, 2019 at 1:24 am #

    Hi! Currently implementing this with images with shape (256, 512, 3) and keep running to an error as follows:

    “ValueError: A target array with shape (1, 16, 16, 1) was passed for an output of shape (None, 16, 32, 1) while using as loss binary_crossentropy. This loss expects targets to have the same shape as the output.”

    I assume that this is due to the downsampling? Any help would be appreciated

    • Avatar
      Jason Brownlee December 15, 2019 at 6:08 am #

      Perhaps start with square images, get it working, then try changing to rectangular images?

      • Avatar
        Anthony Mäkelä December 15, 2019 at 6:21 am #

        Hmm, alright! Could you explain why you use target_size=(256, 512) instead of (256, 256)?

        • Avatar
          Jason Brownlee December 16, 2019 at 6:03 am #

          The images are 256×512 – as they contain the input and output images together.

          We load them and split them into two separate square images 256×256 when working with the GAN.

  18. Avatar
    MK December 16, 2019 at 11:45 pm #

    The discriminator error seems to be going to zero pretty quick, any tips to avoid this?

    • Avatar
      Jason Brownlee December 17, 2019 at 6:36 am #

      Perhaps try running the example a few times and continue running – it may recover.

  19. Avatar
    Avdudia December 17, 2019 at 4:58 am #

    Thank you for this tutorial and simple code. I used it to perform image-to-image translation from Köppen–Geiger climate classification map ( https://en.wikipedia.org/wiki/K%C3%B6ppen_climate_classification ) to real satellite data, with truly amazing results, but I have a question.

    In my strategy I create near one thousand pairs of 256×256 tiles from the Köppen–Geiger map (present in the Wikipedia article above), and a high-resolution satellite map of the Earth. In order to minimize deformation on tiles pairs near poles I use orthographic projection. This gives me nice pairs of image for GAN training (see https://photos.app.goo.gl/eGvpXghUtCB9kqkX6 ).

    I trained the GAN until the end (n_epochs=100) with amazing results. Using training data give truly convincing satellite map validation (https://photos.app.goo.gl/a4EV6Gh15hAnYokm7). Even with hand-painted or with source image converted from a random image into Köppen–Geiger colormap, results are very nice (https://photos.app.goo.gl/eGbFmTH7YqYi4xfu5).

    However I noticed that the result lacked of “relief” effect. Moreover, on large landmasses where the climate does not change but the topography noticeably affects the satellite view (e.g Tibetan Plateau or the Grand Canyon), the model results in “flat” satellite views.

    As the climate map is composed of only 29 different indexed colors (plus the one I added for oceans), a simple label-to-image translation could be used, instead of using a full RGB climate image as input.

    So my idea was to store a heightmap of the earth on the first channel of the input image, and the normalized indexed climate color on the second channel. The third channel is kept unused. It results in a Red-Green image where the Red channel is the heightmap and the Green channel is normalized climate index (see https://photos.app.goo.gl/cN1cmCNLSXwwqzNB9).
    The problem is that training with this input images give bad results compared to my first try (only climate date). Results were already convincing after 30 epochs in my my first try, with smooth transition between climates, why here the boundaries are clearly visible in generated images (see https://photos.app.goo.gl/Q1vjjeY8ewWrCZYv5 ).
    I tried to run the training several times to ensure that it was not purely bad luck, with the same result.

    I don’t understand because climate index can clearly be stored on one channel without information loss, and heightmap provides additional data, so it should improve the results. Is it simply because it needs more epochs ?

    Thank you in advance and sorry for the long post and for my english, it is not my native language.

    • Avatar
      Jason Brownlee December 17, 2019 at 6:42 am #

      Well done!

      Very cool application.

      Two thoughts off the cuff. One would be to make image synthesis conditional on two input images (source and the height map). A second would be to have 2 steps – one step to generate the initial translation and a second to improve the generated image with a height map.

      I’m eager to hear how you go!

      • Avatar
        Avdudia December 18, 2019 at 7:32 am #

        Thank you very much ! The aim is to develop a tool for worlbuidling and create realistic maps of imaginary planets (following Atrifexian’s tutorials https://www.youtube.com/watch?v=5lCbxMZJ4zA&t=1s ).

        I used the idea of using R and G channels for heightmap and climate following this thread concerning the pytorch implementation : https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/498. They recommend to concatenate the input images, but it seems that your code is limited to 3 channels and as I’m a complete beginner I still don’t know how to use more than one image as input.

        However it seems indeed that training on more epochs actually gives good results with my method. Maybe 100 is not enough, so I restarted it with a limit of 1000 epochs. However I have to redo the first 100, an I run the code on Google Colab which seems to be very unstable (I only managed to reach 100 epochs twice).

        Do you have a tutorial on how to make complete checkpoints in order to continue the training in case of crash ? If I understand well, your summarize_performance function only save the generator model, so we should have to save the whole gan_model and reload it for later training. Do you have documentation or examples concerning this ?

        Thank you so much for your tutorial. I’ll keep you informed on later developments !

        • Avatar
          Jason Brownlee December 18, 2019 at 1:28 pm #

          Yes, the above code already saves the model every n steps.

          See the summarize_performance() function.

  20. Avatar
    JS December 18, 2019 at 2:06 am #

    Should i approach this the same way if i have images containing white backgrounds? Similar to the edges2shoes dataset?

    • Avatar
      Jason Brownlee December 18, 2019 at 6:10 am #

      Perhaps test it with a prototype?

      • Avatar
        JS December 18, 2019 at 8:01 pm #

        For some reason i end up only with blank white images…

  21. Avatar
    Sirine December 25, 2019 at 5:56 am #

    Thanks for the wonderful tutorial, Please how can I adapt the generator and the descriminator in order to make a transition from matrix (2,64) into matrix(2,64)

  22. Avatar
    Elbek Khoshimjonov January 2, 2020 at 7:59 pm #

    Thanks Jason for great post!
    I have tried this code, but images do not appear to be good enough, and discriminator loss becomes 0 after 10-15 epochs.

    • Avatar
      Jason Brownlee January 3, 2020 at 7:27 am #

      Perhaps try using a different final model or re-fit the model?

  23. Avatar
    Arsal January 28, 2020 at 10:55 pm #

    I want to continue training from the last checkpoint stored. Can you help me in resuming the training of a model from last checkpoint.

    • Avatar
      Jason Brownlee January 29, 2020 at 6:36 am #

      Yes, load the model as per normal and call the train() function.

  24. Avatar
    Arsal January 28, 2020 at 10:57 pm #

    I have a dataset consisting of 216 images. I trained for 100 epochs but unfortunately the results are not good. Can you help me how can I improve the results?

  25. Avatar
    Laura January 30, 2020 at 3:19 am #

    Thank you for this wonderful tutorial! It has been extremely helpful. I was wondering if you had considered data augmentation?

  26. Avatar
    Laura January 30, 2020 at 3:43 am #


    This is probably a newbie question, but I am new to GANs. In my limited experience with deep CNNs, I used the validation data during the training process to sort of evaluate how well it was “learning”> I then had another dataset I called the “test” dataset that I used after the training process was complete. Here it seems like you don’t use any validation during the training process. And what you call validation is what I call the test dataset. Is that something unique to GANs or can validation be included in the training process?

  27. Avatar
    Michael February 1, 2020 at 12:34 am #

    Hi! Im running this on Titan V and it seems to be running extremely slow. Any ideas as to why?

    • Avatar
      Jason Brownlee February 1, 2020 at 5:56 am #

      Training GANs is slow…

      Perhaps check you have enough RAM and GPU RAM?
      Perhaps compare results on an ec2 with a lot of RAM?
      Perhaps adjust model config to use less RAM?

    • Avatar
      Lisa October 28, 2020 at 8:57 pm #

      I suggest using google colab. You can use their GPUs for training. It’s much faster!

      • Avatar
        Jason Brownlee October 29, 2020 at 8:02 am #

        GPUs are practically a requirement when working with GANs.

  28. Avatar
    pramod kumar February 9, 2020 at 2:42 am #

    sir can i know how you downloaded the images of satellite and maps can you please help me to download my own dataset for this project

    • Avatar
      Jason Brownlee February 9, 2020 at 6:24 am #

      See the section of the above tutorial “Satellite to Map Image Translation Dataset” on how to download the dataset.

  29. Avatar
    Vaibhav Vijay kotwal February 24, 2020 at 6:28 pm #

    Hi Jason,
    From the theory, we understand that dicriminator learns the objective loss function.However referring to define_GAN(),line 56 in the code, I am not able to see the object loss learnt by discriminator getting passed to GAN model. I see that the model doesnot converge as expected

    Thanks and Regards,
    Vaibhav Kotwal

  30. Avatar
    Fırat Erdem February 26, 2020 at 10:49 pm #

    Thanks for the great tutorial. I need help with something. I want to see accuracy metrics for both train dataset and test dataset throughout the training process. And I want to see this for each epoch, not for each steps. Like a standart CNN model training procedure. How can add this things to code ? I couldn’t apply it because it is different than standard CNN codes. I would really appreciate it if you answer. Thank you.

  31. Avatar
    Tom March 8, 2020 at 5:13 am #

    Can pix2pix Gan save and load againt without training again

  32. Avatar
    Ehsan Karimi March 12, 2020 at 2:10 am #

    Hi Jason,
    Thanks for the great tutorial. I have a problem with image scales. In first step, after splitting the input images, I check the image size, instead of of 256*256 pixel they are 134*139 with background. Also, at translation a given standalone image using by model step, the output should be 256*256 same as input, but I get 252*262 output again with background.
    I was wondering if you would mind letting me know where is the problem?
    Thanks in advance

  33. Avatar
    Paolo March 13, 2020 at 8:13 am #

    Great work Jason. Just one question: do you believe that this approach could work using a RGB satellite image against its mask image, to make some kind of image segmentation ?
    Thanks in advance

    • Avatar
      Jason Brownlee March 13, 2020 at 8:24 am #

      Perhaps try it. Prototypes are a fast way to get answers.

      • Avatar
        Paolo March 13, 2020 at 8:31 am #

        I mean, the mask image would have just 2 colors (yes/not) … this was my concern. Thanks

        • Avatar
          Marja June 1, 2021 at 8:07 pm #

          You can add two ‘dummy’ layers to the mask image, so that it is compatible as a target image to the RGB source image. Your RGB image as numpy array will be in the shape of (nr images, width, height, nr bands) where nr bands is three. Your mask image will be in shape (nr images, width, height, nr bands) where nr bands is one. So if you add two bands to the mask image, with e.g. only -1 values, then they are compatible.

  34. Avatar
    Iqbal March 19, 2020 at 9:05 pm #

    It is an interesting article publish here. I am new using this, i want some question for the first script for clear explanation :

    1. I saw the loaded data is maps in train and test folder. I want to know which 3 sample was loaded from the folder train? because the results was : (1096, 256, 256, 3) (1096, 256, 256, 3). I understand 1096 is the certain amount of image in that folder. and 256 I still dont understand because when I open picture 256 is not the same as the it was loaded.

    2. I saw the folder contain image in train and test. I want to ask the train, example 1.jpg it contains two image from satellite. May I know how to develop the left picture and the right picture or it develop itself? Also in test does it develop itself or have to save it first?

    Need some explanation for preparing using it in the future. Thank you

  35. Avatar
    yasser March 22, 2020 at 10:12 am #

    hi thank you for your work !
    I need your help,I need the same model but the input of the generator is one channel and not three .
    I have tried to change it but it does’nt work .thank you

    • Avatar
      Jason Brownlee March 23, 2020 at 6:09 am #

      Sorry to hear that.

      Perhaps confirm that your images are grayscale (1 channel), then change the model to expect 1 channel via the input shape.

  36. Avatar
    runnergirl March 23, 2020 at 5:53 am #

    Hi! I wanted to train myself. I prepared them just like in this tutorial. Size X1 and X2 are the same. Data display works. But I get this error:

    ‘Got inputs shapes: %s’ % (input_shape))
    ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 2, 2, 512), (None, 1, 1, 512)]

    What have I done wrong?

    • Avatar
      Jason Brownlee March 23, 2020 at 6:16 am #

      I don’t know sorry.

    • Avatar
      Mick October 28, 2020 at 10:26 pm #

      Have you been able to solve this problem. I get the same issue.

    • Avatar
      GoComputing October 30, 2020 at 10:36 pm #

      This is probably because your input size is not divisible by 256.

  37. Avatar
    ouis March 24, 2020 at 1:04 am #

    hello I have this problem I dont know why :
    ValueError: Graph disconnected: cannot obtain value for tensor Tensor(“input_14:0”, shape=(?, 256, 256, 3), dtype=float32) at layer “input_14”. The following previous layers were accessed without issue: [‘input_15’]

    • Avatar
      Jason Brownlee March 24, 2020 at 6:05 am #

      Perhaps confirm that your keras and tensorflow versions are updated?

  38. Avatar
    Harshit April 19, 2020 at 2:37 am #

    Hi. I tried to use a different dataset using this code. Specifically the edges2shoes dataset but i was not able to convert it into npz file. Everytime i ran into memory error. My ram is 16GB still that was not enough. I managed to create multiple npz files though. How should i proceed?

    Also could you be kind enough to make tutorial of Tensorflow/Keras of pix2pixHD since it is much more accurate and better in side by side tests compared to normal pix2pix.

  39. Avatar
    Phil April 30, 2020 at 11:23 pm #

    Hi Jason. Great Article. Good explanation. Your articles gave me a good overview and starting point when I started developing my own networks. But I have two questions:
    1:) From what I can see the original code on Github seems to be slightly different to your code in this article when it comes to how you connect an encoder and a decoder layer. On Github data from an encoder layer passed to a decoder layer (via skip-connection) is unactivated, meaning that the data is passed directly after the convolution(or batch norm/dropout), in contrary to the solution here. Is this a mistake or variation ?
    2.) What does the flag ‘training=True’ do when calling batch normalization layer or dropout layer ?
    Thanks in advance.

  40. Avatar
    ElenaRR May 3, 2020 at 4:08 am #

    Hi Jason,
    thank you very much for this tutorial, it’s awesome!
    I have the problem that you mentioned at the end of the article. D1 loss goes to zero after 80-90 steps. Could you explain me why this happens and how can I solve it?

    In addition to this, I can see that only one image is used in every iteration (one real,one fake) where n_batch = 1. Shouln’t we use more than one pair of images to train in each step?

    # select a batch of real samples
    [X_realA, X_realB], y_real = generate_real_samples(dataset, n_batch, n_patch)

    Thank you very much!

  41. Avatar
    ElenaRR May 3, 2020 at 7:39 pm #

    I’ve seen that there are specific codes in your book about this. Is a more complete example for pixtopix? Or is it the same?

    • Avatar
      Jason Brownlee May 4, 2020 at 6:19 am #

      The examples of pix2pix in the book are based on this example.

  42. Avatar
    shami May 11, 2020 at 1:02 am #

    long live 100 years …. superb tutorial with clear explanation

  43. Avatar
    bobbyP May 14, 2020 at 2:29 pm #

    anyway to use a keras data loader/generator for on the fly image loading dsirng training? say for example your training size is really large and loading all at once would result in out of memory errors? thanks so much for all your tutorials, they are incredbile!

  44. Avatar
    cans May 20, 2020 at 6:12 am #

    Hi Sir,
    First of all i read your all tutorials. You are helping me more than my consultant. Thank you so much.
    I am new at Gans. Sorry for my quesitons. But i cant understand how i test this model?
    I will use validation set okey.
    After training model i wont give target, just give source image?I cant get it.
    Or only ı load model and train with validity set?
    Omg i cant explain myself. I hope u are understand me.

    • Avatar
      Jason Brownlee May 20, 2020 at 6:29 am #

      You’re welcome!

      Evaluating GANs is challenging. We do not use a validation set. Instead we generate images and look at them and see if they are good enough.

  45. Avatar
    cans May 21, 2020 at 2:39 am #

    thank you for your answer Sir,
    i wanna try pix2pix gans for image enhancement.
    I’l use source images are low contrast, targets are high contrast ,what do you think? i’m trying to improve thermal image.
    I hope this system will work.

  46. Avatar
    Gruhit Patel May 25, 2020 at 1:44 pm #

    Here in Discriminator what is the need for Concatenating Source and Target images ? What effect would it have ?

    • Avatar
      Jason Brownlee May 26, 2020 at 6:12 am #

      Sorry, which section are you referring to? Where (which section/line?) do we concatenate images?

      • Avatar
        Gruhit Patel May 27, 2020 at 2:48 pm #

        In discriminator. where we are concatenating Source Image and target Image.
        Actually I’m building a GAN model color transformation from Gray to RGB.
        My discriminator and Generator model’s loss falls to zero. So wanted to know that what particular effect does the Concatenation have for discriminator. And if you have any advice for my model than tell it too.
        THANKS in advance…

        • Avatar
          Jason Brownlee May 28, 2020 at 6:09 am #

          Here we are training a conditional model, e.g. generate a target image conditional on the source image.

          E.g. it is the purpose of the model.

  47. Avatar
    Ujjayant Sinha May 27, 2020 at 12:54 am #

    Hello. I compared the images from the summarize_performance() to predictions on unseen ones, which turned out to be quite horrible. Can you suggest some ways to tackle this problem ?

    • Avatar
      Jason Brownlee May 27, 2020 at 7:58 am #

      Try training the model a few times, save many times during each run, choose a model that generates good images.

  48. Avatar
    Zsolt Lipcsei May 27, 2020 at 8:50 am #


    How can I use the generator model to predict any size of images? I mean not just squres sizes. Is there any way at all?

    • Avatar
      Jason Brownlee May 27, 2020 at 1:26 pm #

      You will have to change the generator/discriminator and also the training dataset to the desired size.

  49. Avatar
    Muhammad Ammar Malik June 12, 2020 at 2:49 am #

    Thank you for the awesome post. I have 2 questions if you can answer please.

    First question, you have mentioned:
    “In this case, we will use the model saved at the end of the run, e.g. after 10 epochs or 109,600 training iterations.”

    Shouldn’t the training iterations be 10,960 after 10 epochs.

    Second question, what is the rationale behind using random index to generate real and fake samples? Why can’t we simply iterate over all the samples one by one to make sure no image is missed or used more than once in 1 training step?

    • Avatar
      Jason Brownlee June 12, 2020 at 6:18 am #

      Images are generated after every 10 epochs, it runs for 100 epochs, meaning we save 10 models along the way. Yes, that is a typo, we used the model after 100 epochs. Fixed.

      We can do it for all image, I wanted to work with one image, to show we can use the model ad hoc. Readers often find that step confusing so I must demonstrate it.

  50. Avatar
    Riccardo June 18, 2020 at 11:27 pm #

    thank you for the great tutorial, you helped me a lot!

    i have just a question: am i doing something wrong or is it normal that for a X input i do not have a unique Y output.
    Let me explain better: if i repeat n times the prediction i get n different Y images (i’m checking pixels differences).
    I’m translating this this example to another application and having the exact same output everytime will make it works.

    I tried to look for a random noise vector or something like that but it seems that this is not the case.

  51. Avatar
    Riccardo June 18, 2020 at 11:29 pm #

    ” I’m translating this this example to another application and having the exact same output will make it works” *

    • Avatar
      Riccardo June 18, 2020 at 11:31 pm #

      this tutorial helped me a lot!

      i have just a question: is it normal that if i repeat n times the prediction, with the same input, i have n different outpus?

      I’m checking directly changes in pixel values in Y outputs.

      • Avatar
        Jason Brownlee June 19, 2020 at 6:15 am #

        Yes, this is expected given the stochastic nature of some of the layers.

        • Avatar
          Riccardo June 19, 2020 at 5:35 pm #

          thank you for the reply!
          I’m guessing that dropouts are introducing randomization, i’ll try without that.

          • Avatar
            Jason Brownlee June 20, 2020 at 6:08 am #

            Yes, also there are layers that inject noise directly.

          • Avatar
            Riccardo June 22, 2020 at 6:09 pm #

            i’m sorry if i seem annoying but i do not see layers that can introduce directly noise in the code you provided.

            I see convs, batchnorms and concatenates.
            Can you please tell me which layer is introducing noise directly?

            i think that i have missed something about these layers but reading through the documentation it seems like i know them pretty good.

            I really need to understand the position of these noise generator and remove them in order to use a GAN for my application (maybe it could be impossible but i wish to try =) )

          • Avatar
            Jason Brownlee June 23, 2020 at 6:16 am #

            Sorry, my mistake, I was thinking of a different GAN.

  52. Avatar
    Sahil Singla June 22, 2020 at 7:17 am #

    Thanks for the great tutorial.

    I have one small doubt:
    Do we traverse over the complete dataset?
    We passed our entire dataset to the generate_real_samples function and everytime it chooses a random number, which could be same, if we traverse again and again.

    So, we might not be traversing over the complete dataset in single epoch?

    Please let me know your thoughts.


    • Avatar
      Jason Brownlee June 22, 2020 at 1:25 pm #

      You’re welcome.

      Correct. On average we cover the whole dataset many times.

      • Avatar
        Sahil Singla June 22, 2020 at 6:20 pm #

        So, there is a possibility of missing certain datapoints. This can become a problem if you have very less data points to work with.

        So should I change the code to make sure, it traverse over entire data, or is it still ok, if we don’t do that ?

        • Avatar
          Jason Brownlee June 23, 2020 at 6:17 am #

          If you prefer. I’m not convinced it makes a difference, but could be a fun experiment.

    • Avatar
      Riccardo June 23, 2020 at 6:08 pm #

      oh, ok no problem!
      i think that i will investigate stochasticity trought the different convs and batch norm in order to make the net able to predict the same Y from an X input.

      best regards

  53. Avatar
    yacine June 30, 2020 at 7:30 pm #

    Thank you so much for this super clear explanation and code.

  54. Avatar
    Steve Newbold July 3, 2020 at 11:59 pm #

    If I wanted to use an input with three colour channels and a target of four colour channels, can this be configured or is it best to just create an additional black 4th channel on the input?

    I noticed some greyscale-to-colour models just use the same data in each channel to represent grey images so presumed it mush be easier to do this than make the model work with differing numbers of channels.

    Also, thanks for the excellent resource!

    • Avatar
      Jason Brownlee July 4, 2020 at 6:01 am #

      Off the cuff I recall the images have to have the same number of channels. Perhaps experiment/research and see if you can deviate from this norm.

  55. Avatar
    M July 8, 2020 at 10:42 am #

    Thanks for this great post!
    For your generator’s loss, how can I know if are you minimizing 1: log(1 – D(G(x))) or maximizing 2: log D(G(x))?
    How can one change the loss function, any reading suggestions?
    Some people say the choice of generator’s loss can help the model to not get stuck in early stages of training.

  56. Avatar
    Rao July 21, 2020 at 7:02 am #

    Hello Jason,

    What would be the optimal loss values (Generator and Discriminator loss) of a successful conditional GAN model? Are the values same as an unconditional GAN ? (i.e around 0.7 or 0.6, as mentioned in your unconditional GAN article)

    Secondly, I have done the training of pix2pix for a certain image to image translation task in two different ways.

    1st method: Trained the discriminator patch outcome against a matrix of real or fake labels (as mentioned in this article)
    2nd method: The discriminator still gives a patch, but this time, the patch average was taken and was trained against a single value ( i.e avg value of the patch against a real or fake label).

    During the training ( towards the saving of a good model), the first method, yields a patch avg value of about 0.4 for a real image pair and about 0.3 for a fake image pair.
    But the second model, yields a patch avg value of about 0.0004 for both real and fake image pairs.

    Both these models yielded a good quality image with its Generator and the Discriminator loss standing around 0.7 and 0.6 respectively. My doubt is why such discrepancy with the avg patch values even though both the models yields a good quality image? Secondly, an avg patch value of 0.0004 doesn’t make sense even though this model yielded a good translated image.(Because as far as my understanding, each pixel values in the patch for a real pair should be close to 1 for a real pair and 0 for a fake image pair. This would mean that the avg of the patches should also be close to 1 for a real pair and 0 for a fake image pair).

    What should be the avg patch values for a good model? Any amount of insights into this would be greatly helpful. Hope I made sense.


  57. Avatar
    Gruhit Patel July 22, 2020 at 4:56 pm #

    Sir, Why exactly are we merging two images in discriminator ?? What effect does it have ?? And why are we not keeping just the colored image in Discriminator ??

    • Avatar
      Jason Brownlee July 23, 2020 at 6:03 am #

      The discriminator is given the input image and a target image and comments on whether the target is a real translation or a generated translation.

      • Avatar
        Jamal July 24, 2020 at 7:54 pm #

        Can PIX 2 PIX GAN works for gray-scale images??
        what if if we use the above same architecture for gray-scale source and target images.

        • Avatar
          Jason Brownlee July 25, 2020 at 6:17 am #

          Modification of the model architecture is required.

  58. Avatar
    Zoya July 24, 2020 at 5:10 pm #

    I used different data of source and target image. My source and target images are gray scale.
    But when i run the code , the discriminator loss is going to zero with very few iterations but generator loss is very high that is ,,9782.150 up to so on.
    It cannot be decreasing ….What can i do ??

  59. Avatar
    John July 24, 2020 at 7:44 pm #

    I have different source and Target images. And My source and target images are in gray scale but my discriminator loss is going to very low reaches to zero but generator loss is very high.

    what can I do now ?? Can Pix to Pix GAN work for gray scale images.

    • Avatar
      Jason Brownlee July 25, 2020 at 6:17 am #

      You may need to tune the model – explore – in order to discover how to best modify the model architecture to support grayscale images.

  60. Avatar
    Gruhit Patel July 26, 2020 at 12:11 pm #

    What sort of modification you think are required ?? Like the architecture won’t be U-shaped ?? Or is the loss that needs to be changed ??

    • Avatar
      Jason Brownlee July 26, 2020 at 1:40 pm #

      It is hard to know – experimentation is required, perhaps start with tuning the learning rate with a similar network structure adjusted for the changed number of channels.

  61. Avatar
    Jeremy Bolton July 28, 2020 at 7:50 pm #


    Thanks for your great work!

    I found your model above and tried with the cityscapes images. I trained ~3000 image pairs from segmentation to photographic pictures. First I convert the images to 256×256 and kept the 100 epochs, then trained with 250 epochs. The results were good, but blurry, so I converted the original 1024×2048 resolution images to 512*512 and trained them till 250 epochs.

    The results didn’t really improve, but somehow I’d like to get less blurry pictures. I think increasing the number of epochs or the image resolution didn’t change a lot, so my question would be: Do I need to change on the architecture of the models? If yes, can you give me a hint what further layers should I use?

    Thank you very much and keep up the good work!

    • Avatar
      Jason Brownlee July 29, 2020 at 5:50 am #

      You may need to experiment with the model architecture and learning hyperparameters in order to discover what works best for your specific dataset.

      • Avatar
        Jeremy Bolton July 30, 2020 at 8:40 pm #

        Thanks for your reply, Jason.

        Can you give me a hint, what architectural changes I should start with if I want to train with 512×512 resolution images or even bigger instead of 256×256? More conv2d layers, dropout layers or multiple discriminators/generators as in pix2pixhd?

        Thank you.

        • Avatar
          Jason Brownlee July 31, 2020 at 6:18 am #

          A good approach is to scale the number of blocks up or down from the current to meet the desired image dimensions.

          I would encourage you to experiment and observe the effects on input/output shapes to get a feeling for it.

  62. Avatar
    phaneeshwar August 2, 2020 at 4:18 am #

    dataset = LoadRealData (‘C:/Users/Eeshwar/Desktop/deep learning/maps11.npz’)
    print(‘Loaded’,dataset[0].shape, dataset[1].shape)
    Imgshape = dataset[0].shape[1:]
    dmodel = DiscriminatorModel(Imgshape)
    gmodel = GeneratorModel(Imgshape)
    ganmodel = GANModel(dmodel,gmodel,Imgshape)


    Loaded (1096, 256, 256, 3) (1096, 256, 256, 3)
    WARNING:tensorflow:Discrepancy between trainable weights and collected trainable weights, did you set model.trainable without calling model.compile after ?

    InvalidArgumentError: data[0].shape = [4] does not start with indices[0].shape = [2]
    [[{{node training/Adam/gradients/gradients/loss_3/dense_2_loss/Mean_grad/DynamicStitch}}]]

    Sir could you please help me to resolve this issue. I Thank You in advance

  63. Avatar
    Dhruv Agarwal August 2, 2020 at 4:41 pm #

    Hello sir, the tutorial was great, but i have 2 questions.

    1) In the define_discriminator() function, you have set the loss_weights parameter to 0.5, to slow down the training of discriminator. Can’t we reduce the learning rate of the discriminator model to slow the training, instead of specifying the loss_weights parameter?

    2) In the define_gan() function, why was there even a need to specify loss_weights parameter over there?

    • Avatar
      Jason Brownlee August 3, 2020 at 5:45 am #


      Perhaps try it an see.

      We do se a loss_weights for the gan.

      • Avatar
        Dhruv Agarwal August 4, 2020 at 1:14 am #

        Ok , i will try reducing the learning rate instead of specifiying the loss_weights parameter in the define_discriminator(). But i am sorry, but i still do not get the answer of the second question, i.e, why do we need to specify loss_weights parameter in the define_gan() function.

        • Avatar
          Jason Brownlee August 4, 2020 at 6:42 am #

          To match the implementation described in the paper.

          It has the effect of giving most attention to L1 and a tiny bit of attention to cross entropy.

          This is explained in that section of the tutorial, perhaps re-read?

  64. Avatar
    Amine Zera August 4, 2020 at 9:35 pm #

    Hello sir, thank you for the great tutorial !
    I am new to Machine Learning,
    I want to change the clothings of people in images or videos. So i should train pix2pix on a clothes dataset ?
    The second question is that i dont want to change anything else in the image except the clothes, so if i apply pix2pix on the image it will change everything, how can i target only clothing in a image ?
    Thank’s again for your great work !

  65. Avatar
    Alex Westcott August 19, 2020 at 11:04 pm #


    I have trained the exact model outlined in the tutorial with the same data-set quite a few times and the losses of the discriminator are always consistently 0.000 after around 5000 steps. Looking at the loss to more significant figures, shows that the loss is greater than zero, hence, when you state that, if the discriminator loss stays at zero for a long time then there is training failure, do you mean zero to 3 decimal places (0.000)?

    The generator still improves after the discriminator loss states 0.000, however I presume that the discriminator is no longer having a significant impact on the training of the generator.

    Thank you for the great tutorial, it helped a lot!

    • Avatar
      Jason Brownlee August 20, 2020 at 6:43 am #

      Zero loss indicates a failure mode:

      Recall that GANs do not converge:

      Are you saving models along the way during training?
      Are you able to inspect the progress of training, does it get good then go bad or is it bad the entire time?

      • Avatar
        Alex Westcott August 20, 2020 at 6:02 pm #

        I am saving the model every 5 epochs, and the predicted images do improve slightly during training, and by the end look reasonably good, (I presume that the discriminator hasn’t had an impact on the quality and it is just the generator improving by itself).The losses of both the discriminator and generator decrease to start with, but the discriminator slowly decreases to 0 and the generator stays pretty low (between 1 and 5).

        I have assumed that the discriminator is too good at determining the real and fake images, as I have removed a few layers from it and it’s loss doesn’t decay to 0 during training.

  66. Avatar
    Reshma Jindal August 20, 2020 at 1:51 am #

    I have around 3700 images to train on.

    Can you roughly guide for the hyperparameters(like n_epochs,n_batch to be set as I’m encountering the following issue?
    Please help in resolving it.

    /home/reshmajindal/.local/lib/python3.6/site-packages/keras/engine/training.py:490: UserWarning: Discrepancy between trainable weights and collected trainable weights, did you set model.trainable without calling model.compile after ?
    ‘Discrepancy between trainable weights and collected trainable’

  67. Avatar
    Awadelrahman M. A. Ahmed August 31, 2020 at 2:33 am #

    Thanks for this GREAT detailed tutorial. One question I have in mind is how to adapt the model to input different sizes of images? i.e. if the training/validation images have different height and width values?

    • Avatar
      Jason Brownlee August 31, 2020 at 6:17 am #

      You’re welcome.

      Typically images are all resized to the same width and height expected by the model.

      • Avatar
        Awadelrahman M. A. Ahmed September 2, 2020 at 6:17 am #

        resizing is a bit flexible term 🙂
        cropping big images leads to loosing some information. enlarging small images might lead to blurry images. Super-resolution is computationally expensive and needs auxiliary models. What do you think the good way to “resize” images to work properly with this model ?

        • Avatar
          Jason Brownlee September 2, 2020 at 6:35 am #

          I recommend exploring many diffrent approaches and discover what works best for your specific project.

          • Avatar
            Awadelrahman M. A. Ahmed September 2, 2020 at 9:02 am #

            YES!! the best way to find out is by doing it! this why I feel addicted to this machinelearningmastery :p

          • Avatar
            Jason Brownlee September 2, 2020 at 1:29 pm #


  68. Avatar
    Nalin Nagar September 9, 2020 at 11:36 am #

    Is there a way to input your own image? I haven’t seen any demonstrations that are able to input your own image and I have tried doing it myself but to no avail.

    • Avatar
      Jason Brownlee September 9, 2020 at 1:34 pm #

      Yes, the last part of the tutorial shows this.

  69. Avatar
    Harry September 9, 2020 at 4:10 pm #

    Hi everyone. Thank you for super guideline for implementation. I have one question. Can i generate 1024×1024 px image by using pix2pix-GAN?

    • Avatar
      Jason Brownlee September 10, 2020 at 6:22 am #

      Perhaps try scaling up the model for large images and see what kind of results you get.

      I would expect quality to fall off. It might be easier with a model based on the progressive-growing architecture.

  70. Avatar
    Harry September 9, 2020 at 4:23 pm #

    By the way, my dataset image size is smaller than 1024px

  71. Avatar
    Bidesh Sengupta September 14, 2020 at 2:52 pm #


    It is a really good tutorial. I wish to apply this concept to my work. But I want to give some numerical parameters (say P1, P2, P3…) along with image as input and wish to get the image as output.

    Can you guide me on how to change the code to implement this? Is it at all possible?

  72. Avatar
    Manohar Sai October 5, 2020 at 10:56 pm #

    Thanks for this great tutorial.
    Both losses for the discriminator has gone to zero in the first 100 epochs.
    Can you help me?

    • Avatar
      Jason Brownlee October 6, 2020 at 6:51 am #

      Perhaps restart training and stop once the generated images are good enough.

  73. Avatar
    Manohar October 5, 2020 at 11:18 pm #

    Great tutorial sir.
    I have my both discriminator loss heading to zero, in the first 200 steps. I cannot solve my issue and had run many times. Can this be a problem with the version?

  74. Avatar
    Mick October 28, 2020 at 8:06 pm #

    Great tutorial!

    I am trying to apply this architecture to a MRI image-to-image translation task. I have two questions regarding the architecture for this purpose:
    1) After slicing the MRI data to 2D slices. Do I need to convert the NIFTI-files to JPEG or can I directly save them as npz (compressed numpy array)?
    2) MRI images are grayscale whereas the example code in this tutorial uses RGB images. What would change in the architecture of the tutorial to deal with grayscale images?

    Thanks Jason.

    • Avatar
      Jason Brownlee October 29, 2020 at 8:00 am #

      The model takes image data as numpy arrays. I don’t know if converting data to jpeg first is required for your data.

      Yes, you can adapt the model for grayscale images, e.g. change the number of channels for input images to D() and output from G().

  75. Avatar
    Adrien November 2, 2020 at 7:08 am #


    What an incredible article. I reproduced your methodology on a research project on mechanical networks, where the model learns to draw mechanical linkages between parts of the system. It works perfectly, despite a small sample of training images.

    I re-used one of your images (the Unet architecture of the Generator) on a blog post I made on Medium, carefully citing your article as source and your work as reference. You can check it out here:

    I wanted to make sure you approved the re-use of the image in question. Thanks again for you work here and more broadly on Machine Learning Mastery.

  76. Avatar
    ZiZi November 11, 2020 at 2:13 am #

    Thank you for your great tutorial
    I read a few posts about GANs and i realized GANs applyed in square images. is it right? can i use it for non-square images?

  77. Avatar
    Rekka Mastouri December 1, 2020 at 11:04 pm #

    Thank you for your great tutorial.

    please how can I use GAN for deformable image registration?

    Thanks Jason.

    • Avatar
      Jason Brownlee December 2, 2020 at 7:44 am #

      You’re welcome.

      Perhaps start by checking the literature for existing approaches and try them first.

    • Avatar
      Annie May 29, 2022 at 3:35 pm #

      hi, Rekka, do you find any method to do with the deformable images? thanks

  78. Avatar
    Zaineb December 2, 2020 at 11:15 pm #

    Hope you are doing good.

    I have tried your code and it works perfectly well.
    I need to know, how about testing this module on a separate dataset,because i have found out that most of segmentation algorithms using gans include testing dataset also.

    If i use a part of validation dataset ( and call it my test dataset) on saved model (e..g model_109600.h5) the results are fine. But if i use a different test dataset, the segmentation results are not desirable.

    I would be glad if you can shed some light on this. Also please tell me, is there any way that this algorithm can be tested on a test dataset? If not, is there any reference that signifies that testing pix2pix for image to image translation is not a good choice?


    • Avatar
      Jason Brownlee December 3, 2020 at 8:18 am #


      Sorry, I don’t have an example of combining GAN output with a predictive model – I don’t think I can give you good off the cuff advice on the topic. Perhaps check the literature.

  79. Avatar
    Michal December 16, 2020 at 12:15 pm #

    Hey, really well explained, good job!
    I have implemented similar cGAN for b&w image colorization. It is very hard to train, and somehow after many, many epochs on big datasets I got some ‘good enough’ results, but I wonder how can I measure accuracy for translated images?

    Also during training and after finishing it my cGAN is resulting in very big Losses of gen like 10.0 and 2.0 at the end of training. Disctiminator’s loss is near 0 and peaking sometimes to even 3 or 5. How can I measure accuracy of trained model or during training?

  80. Avatar
    Muhammad Gulfam December 27, 2020 at 7:34 am #

    Hi Jason,
    Thank you very much for detailed explanation with examples. It is very helpful.
    I am trying to edit the code through notepad++ but it is giving me indentation error. Seems like there are a mix of spaces and tabs.
    Can you please tell me what IDE or editor you used?
    Apologies for a silly question.

  81. Avatar
    Gavin December 28, 2020 at 6:01 am #

    Amazing tutorial, even more impressive that you’ve responded to every comment over year later! Quick question: you said that if either discriminant loss plateaus at 0 for an extended period of time that it has most likely failed and should be restarted. I am running it for the third time and both have landed on zero again, am I doing something wrong? Anything I can do to improve chances of it succeeding or just keep trying? (P.S. I am using different images and am using 2000 images as opposed to your ~1100 (still 100 epochs) but I assume that this does not affect the base of the model). Thanks in advance.

    • Avatar
      Jason Brownlee December 28, 2020 at 6:06 am #


      Sorry to hear that.

      Perhaps try fewer epochs?
      Perhaps try changing other learning hyperparameters?
      Perhaps try adjusting the architecture?
      Perhaps try some of the ideas here:

      • Avatar
        Gavin December 30, 2020 at 10:34 am #

        Just an update, tried some things you suggested, in the article as well as in the well appreciated comment, not much changed. Let it run just to see what would happen and even though the model read 0 for both discriminators for more than 6 epochs, it still gave me decent results, so I’m happy. Thanks for the amazing article and the helpful advice, will definitely be reading up on some of your other articles.

  82. Avatar
    Eric January 2, 2021 at 4:51 am #

    Hi, I love your whole blog and tutorials!

    Just a question, Is it possible to train a model that uses 2 source images for one target?
    For example from a traditional satellite image + an Infra Red (IR) image recreate the corresponding map?

    Thanks a lot

    • Avatar
      Jason Brownlee January 2, 2021 at 6:27 am #


      I don’t see why not. I expect there are papers on exactly this – I recommend seeking them out to get ideas.

  83. Avatar
    Wass January 3, 2021 at 7:54 am #

    Thank you very much for the amazing tutorial!
    My question is if it is possible to continue training from a saved model ? what would be the inputs of train function ? Thanks again

    • Avatar
      Jason Brownlee January 3, 2021 at 1:29 pm #

      You’re welcome.

      Yes, you can load the saved model and continue training. You can use the same code as the first round of training as a starting point.

  84. Avatar
    D.GHOSH January 11, 2021 at 6:02 am #

    Is this model applicable to generate super resolution data?

    • Avatar
      Jason Brownlee January 11, 2021 at 6:23 am #

      No, I believe there are more specialized models for that problem described in the literature.

      • Avatar
        Muhammad Gulfam January 12, 2021 at 8:38 am #

        Can you please share the link of some articles for those specialized models? for generating super resolution data.

  85. Avatar
    Muhammad Gulfam January 12, 2021 at 8:06 am #

    What is the significance of converting the pixel values from [0, 255] to [-1, 1]?
    Is it because of the tanh activation function being used in the generator model for the last layer?
    This architecture can be used to matrix to matrix mapping as well. but a matrix might have pixel (arr[row, col]) values as real values (from [0, inf] instead if [0, 255]). In that case, what would you suggest for transformation (to [-1, 1])? Should that still be done?

    Apologies for multiple questions.

    • Avatar
      Jason Brownlee January 12, 2021 at 10:31 am #

      Yes, exactly.

      Yes, it is standard practice to use tanh for the output layer of gan generator models and to scale data to match the distribution of the activation function.

      • Avatar
        Muhammad Gulfam January 12, 2021 at 5:05 pm #

        Thank you very much. I appreciate the responses.

  86. Avatar
    Muhammad Gulfam January 15, 2021 at 5:11 am #

    I have noticed that in the code that the discriminator model is being compiled and the gan model is also being compiled but the generator model is not being compiled. generator is being saved. Whenever I load the generator model for prediction, it generate a warning saying
    “No training configuration found in save file: the model was *not* compiled. Compile it manually”
    Can you please guide me if it can affect model’s performance? seems like my models are not working.
    After googling it I got a perception that it is just a warning but still wanted to check with you.


    • Avatar
      Jason Brownlee January 15, 2021 at 5:59 am #

      No need as we are not training it directly. You can ignore the warning.

  87. Avatar
    Muhammad Gulfam January 15, 2021 at 1:25 pm #

    Thank you for the response.

  88. Avatar
    WinK January 21, 2021 at 2:43 pm #

    I am fan of your site. Always thanks for your great article.

    I would like to ask loss function that you utilize in the logical gan model. In your code block, 2 loss function was used in define_gan function.

    model.compile(loss=[‘binary_crossentropy’, ‘mae’], optimizer=opt, loss_weights=[1,100])

    If I understand correctly, ‘mae’ takes labels (true and predicted labels) instead of images. But in the pix2pix paper, l1 loss was defined as follows:

    L1(G) =Ex,y,z[‖y−G(x,z)‖]

    The output of G model is image and their loss function is defined based on differences between true images and generated images instead of labels.

    Is it the same effect with labels instead of using images?

    • Avatar
      Jason Brownlee January 22, 2021 at 7:15 am #

      Yes, MAE is the L1 norm between image pixels.

      • Avatar
        WinK January 25, 2021 at 12:54 pm #

        Thank you for your answer.

  89. Avatar
    Muhammad Gulfam February 1, 2021 at 12:02 pm #

    Hi Dr. Brownlee,
    In your last version there was a line in the define_gan method:
    # make weights in the discriminator not trainable
    d_model.trainable = False
    my question is that it the discriminator is not trainable then how will it improve?
    In current version of your code you have replaced it by following lines:
    # make weights in the discriminator not trainable
    for layer in d_model.layers:
    if not isinstance(layer, BatchNormalization):
    layer.trainable = False
    if the weights are not trainable then how will discriminator learn and get better, and contribute to make the generator better?
    my understanding was that weights are the ones that are supposed to be trained in the training process. Please correct me if I am wrong. Apologies as I am not an expert. I am learning through your articles and other stuff.
    Thanks in advance.

    • Avatar
      Jason Brownlee February 1, 2021 at 1:48 pm #

      The D is only not trainable when part of the composite model. This is called layer/model freezing. It is still trainable as as a standalone model.

      • Avatar
        Muhammad Gulfam February 2, 2021 at 2:30 am #

        Thank you so much Dr. Brownlee.

  90. Avatar
    Glenn Q February 16, 2021 at 10:23 am #

    Hi Dr. Brownlee, if I want to have a higher learning rate for the discriminator and a lower one for the generator, say 2e^-4 for discriminator and 1e^-4 for the generator, should I just change the learning rate setting of the composite model?

    • Avatar
      Jason Brownlee February 16, 2021 at 1:38 pm #

      Yes, the composite model is used to update the generator.

      Let me know how you go with your approach.

  91. Avatar
    Alice February 17, 2021 at 8:19 pm #

    Hi Jason,

    Thank you for your great tutorial.
    I just want to ask you one question: why during the inference we have to keep the batch norm and dropout in the training mode?
    I understand that the dropout is performed to add some noise, but I thought it was necessary only for the training part.

    Moreover, I have performed the training with a batch size = 1 and in the prediction phase I had applyied the generator to a volume of stacked images of dimension [N, 256, 256, 3] and the results were very different. Using a batch size = 1 in the prediction phase gave me better results. I think that this is correlated to the adoption of BN in training modality.

    Thank you for your time

    • Avatar
      Jason Brownlee February 18, 2021 at 5:14 am #

      No. Batchnorm and dropout can are flipped to inference mode. Batchnorm will use learned mu and sigma and dropout will stop dropping out.

      • Avatar
        Alice February 23, 2021 at 10:16 pm #

        but the flag training is set to True for both BN and dropout, I think that this flag makes them work as during the training phase

  92. Avatar
    anarchitect February 28, 2021 at 5:21 am #


    Is it possible to plot losses in realtime? I couldn’t manage to do it. Could you please help me?

  93. Avatar
    Lin April 10, 2021 at 4:55 pm #

    Hello, thank you for the sharing.
    I’d like to know what is d1[0.362] d2[0.405] g[78.143] each loss value’s meaning?
    Does it mean that is fake when discriminator’s loss value close to zero?
    And what is the composite’s loss calue mean?

  94. Avatar
    Nada April 27, 2021 at 1:39 pm #

    Hi Jason,

    Thank you very much for this informative article.

    I have a question about a good GAN model to create more synthesis images from a small set of medical images? Is styleGaN good for this problem?

    • Avatar
      Jason Brownlee April 28, 2021 at 6:00 am #

      Perhaps trial a few methods and discover what works well or best for your dataset.

  95. Avatar
    Rishabh singh April 27, 2021 at 11:53 pm #


    If possible , can you please share the .h5 model after complete training. As I am trying but not able to train my model fully due to low computation power.
    I have tried on colab too, but gets stopped after some time.

  96. Avatar
    Ramesh Vishwasrao May 11, 2021 at 9:36 pm #

    Hey Jason,
    This was an awesome tutorial.
    I wanted to try this code. Installed the necessary libraries. Actually, I don’t have GPU on my machine. So, i am deciding to go with doing less epochs in one go i.e. lets say i run the train function for 5 epochs then i save the models and next day i load these same models and train for next 5 epochs.(doing this because in one day 5 epochs itselfs takes a long time and my machine gets heated a lot)

    I created few new functions for loading the previous trained models.
    Did not alter any of your code, except for summarize performance function and reduce n-epochs in train funct

    I saved the d_model, g_model, gan_model and plot after each epoch.
    Then for the next epoch i loaded the most recent epoch trained, and proceeded with next set of epochs.
    But, after like 3 sets ie 15 epochs, 16th epoch onwaards, the Discriminator error started converging to zero. I tried two more sets, but did not improve, the ouput qaulity also did not immprove.

    I dont know what the problem is.
    Do i need to save more models than these 3(g_model, d_model, gan_model) or do i need to save any more data/model/parameter ?
    Can you help me with this? (like what’s causing the problem)

    • Avatar
      Jason Brownlee May 12, 2021 at 6:12 am #

      Perhaps try running it again and see if you get the same problem, sometimes training GANs fails for no reason.

      • Avatar
        Ramesh Vishwasrao May 13, 2021 at 6:15 am #

        Thanks a lot for replying ..!

        Actually, i just did the training once again, and realized that these two warnings showed up, while i was training before also, :
        “warnings.warn(‘No training configuration found in save file: ‘”
        “warnings.warn(‘Error in loading the saved optimizer ‘”

        i am using model.save(path+model_name.h5) fucntion to save models

        do you think this is what is causing it??

        after i load the latest model available, to train it again, do i also need to add a optimizer manually?
        like this:

        for d_model
        opt = Adam(lr=0.0002, beta_1=0.5)
        model.compile(loss=’binary_crossentropy’, optimizer=opt, loss_weights=[0.5])

        for gan_model
        opt = Adam(lr=0.0002, beta_1=0.5)
        model.compile(loss=[‘binary_crossentropy’, ‘mae’], optimizer=opt, loss_weights=[1,100])


        • Avatar
          Jason Brownlee May 14, 2021 at 6:15 am #

          Maybe, but I don’t think the warnings are relevant.

          Good question. Perhaps with and without re-defining the optimizer. I suspect re-defining it would start it off at a new learning rate and might wash away your model weights. Experiment to see what is appropriate.

        • Avatar
          Yutao Chen June 22, 2021 at 12:57 am #

          You should save the optimizer as you save your model. If you define a new optimizer, you’ll lose all the internal “momentum” in the previous training.

  97. Avatar
    mehranm May 17, 2021 at 12:03 am #

    hello. thanks for sharing.
    I’d like to train a pix2pix model to segment crack images but i have some problems in training. during the training process, the loss of the discriminator was decreasing but the loss of the generator was increasing. as a result of this problem, the model was not trained well.
    can anyone guide me ?

    • Avatar
      Jason Brownlee May 17, 2021 at 5:38 am #

      I don’t think pix2pix is appropriate if your goal is to segment images. Consider a mask rcnn.

  98. Avatar
    sukhan May 19, 2021 at 4:33 pm #

    Hi jason thanks for the wonderful article!
    I want to implement the same for my problem which is handwritten text line segmentation, i have dataset for handwritten documents and similar ground truth created with boundry lines for each line in document
    can i use this method to map the handwritten document images to target handwritten document images with boundries of text lines drawn
    the motive is to segment the text lines in handwritten document i have 200 documnet images
    kindly reply it would be really helpful, and what other approaches i can use to modify this GAN

  99. Avatar
    sukhan May 20, 2021 at 9:02 pm #

    how can we make modifications in this network , like any other option for change in generator and discriminator but the task is same image translation
    can we use concept of transfer learning in this

    • Avatar
      Jason Brownlee May 21, 2021 at 5:59 am #

      Yes, perhaps try adapting it for your use case using trial and error.

      • Avatar
        sukhan June 2, 2021 at 4:36 pm #

        hi jason , i tried it but , the image generated with boundaries are different from the source image given, like the content of image(text document) get changed, i dont know why it happening
        like the source image given for segmentation and the resultant image(translated image/generated image) with segmentation are different
        plz help

        • Avatar
          Jason Brownlee June 3, 2021 at 5:30 am #

          Sorry to hear that, you may need to experiment a little, or prototype some alternatives.

  100. Avatar
    Syd Rawat May 24, 2021 at 2:43 pm #

    Hi Jason,

    Thanks for making the code opensource. I was wondering is there any way to visualize he intermediate activation maps of the trained network? I mean as the data flows through the trained network model?

  101. Avatar
    Alin May 29, 2021 at 6:27 am #

    Hi Jason,
    I managed use and train the network , thanks a lot!

    I have a question though, why is the binary cross entropy used in this case? Why not MSE?
    I did not find it (binary cross entropy) in the original paper of Isola et al or the code…. Are there any benefits and do you have a paper for that I could look into?

    Thank you!

    • Avatar
      Jason Brownlee May 29, 2021 at 6:56 am #

      Well done!

      Offhand, I believe I used the same loss as the paper.

      Yes, there is a difference and it often matters a great deal for the model and application. Nevertheless, try changing it up and compare results.

  102. Avatar
    Jojo July 6, 2021 at 1:43 pm #

    Thank you for such a well-written article. Learnt a lot from this. Also, after reading, I developed my own pix2pix application: converting image to ASCII art. Your feedback would be great 🙂
    My article here: https://jojo96.medium.com/generating-ascii-art-with-pix2pix-gans-dbee268b156a

  103. Avatar
    Laurin Herbsthofer August 4, 2021 at 8:03 pm #

    Hi Jason, thanks for the great tutorial! It helped me to understand how GANS work.

    For others that want to try the tutorial: the link provided to download the maps data from pix2pix is no longer working. However, it is still contained in this kaggle data set: https://www.kaggle.com/vikramtiwari/pix2pix-dataset

    Unfortunately I didn’t yet get great results with your code as-is (tensorflow v2.4). I restarted training many times, and only once got barely meaningful images after the first few epochs followed by mode collapse. Indeed, almost always I get mode collapse early on which does not get resolved even after 100 epochs. I tried many things already, like label smoothing, reducing learning rate, skipping training of the discriminator in some epochs, changing the data set sample size and some others but without success. The best I ever got was training only on 10 samples (trying to let the generator overfit the data), which makes me think that in principle the data set and the setup is ok, but I could never repeat those results, especially not on the full data set: https://drive.google.com/file/d/1swhpIqQhc-fCoftySuDAgETH2z6RxvKz/view?usp=sharing

    Do you think that something in tensorflow has changed since you released this tutorial? Or maybe the data set from kaggle is not actually the same? I’m running out of ideas to make it work 🙂

    Thanks again for all the great work you do and the awesome and easy-to-follow tutorials.

    All the best, Laurin

  104. Avatar
    Laurin Herbsthofer August 4, 2021 at 8:30 pm #

    Oh, probably the link still works but since its http instead of https some browsers may not allow direct download, so no need to go to kaggle, oopsi 🙂

    • Avatar
      Jason Brownlee August 5, 2021 at 5:18 am #


    • Avatar
      mahady hasan rayhan March 18, 2022 at 10:56 pm #

      that is your env config?
      i mean, TensorFlow, Python, Keras, Cuda, and cudnn version?

  105. Avatar
    Marja August 6, 2021 at 10:06 pm #

    Hi Jason,

    Is it necessary for the source and target images to have the same range of values? For example, if the source image has the values in the range of [-0.7,0.7] and the target image in the range [-1,1]. Or should both be in the same range?

    I’m asking since the training data I have has float with a wide range that need to be scaled to values which fall within the range of [-1,1]. But to have a little bit of space for my test data, which could possibly have a min/max outside of the trange of the training data, I’m scaling it to [-0.7,0.7]. However my target data is just a black and white mask image, so will always be in the range of [0,255]. Therefore it can just be scaled to [-1,1]. But I’m not sure that is is correct to do that or that the target data should be ‘compliant with’ the source data and should also be scaled to [-0.7,-.0.7].

    I hope you understand my question and want to answer it.

    Thank you for a great tutorial!

    Kind regards,


  106. Avatar
    Lisa August 13, 2021 at 10:50 pm #

    Hi Jason,

    I have an class imbalanced dataset (with two classes). I know there are loss functions better suited for imbalanced datasets than the binary cross entropy used used in this model. For example binary focal dice loss. But I’m wondering if changing the loss function for this gan model will make things worse? Do you think it’s possible to improve the gan by changing the loss function? Or should I just stick to under/oversampling and/or data augmentation of my data set?

    • Adrian Tam
      Adrian Tam August 14, 2021 at 3:30 am #

      Generally, whether to change a loss function or a hyperparameter can be reasoned as whether you can associate your decision with the problem you are solving. For example, why I do not want to use binary cross entropy? Because the data are imbalanced and the entropy will not improve significantly even if my model is significantly better. By answering yourself like this, you can tell whether you are making a good decision.

  107. Avatar
    Marja August 25, 2021 at 11:15 pm #

    Hi Jason,

    I’m wondering about saving the model and then at a later time continuing training. In the summarize-performance function only the generator function is saved, but not the discriminator or the gan/combined model. If I then want to continue training, for example to reach epoch 150 instead of 100, is it then sufficient to use the trained generator which I saved at epoch 100 and then the not saved discriminator and gan model? Or do you have to save the trained discriminator and gan model as well at that epoch?

    • Adrian Tam
      Adrian Tam August 27, 2021 at 5:36 am #

      Better save both as GAN is the orchestrated work of both the generator and discriminator together.

  108. Avatar
    Bernat September 21, 2021 at 4:31 am #

    Thanks for this course !

    How can we use this in 720p images ? (720×1280)
    Because this works just for squared images


    • Adrian Tam
      Adrian Tam September 21, 2021 at 9:41 am #

      You can modify the input shape and everything should just work (as long as you get the data in the corresponding size to train it). Alternatively, you can imagine your 720p images are composed of many small squares and apply to this model, then snitch it back together.

  109. Avatar
    Rom October 7, 2021 at 5:58 pm #

    I’m trying to keep training the model after I stoped the training by reloading the model to the training function. How can I pass the whole h5 model at once instead of g_model , d_model and gan_model? What chahes exactly do I need to make?
    Thanks a lot!

    • Adrian Tam
      Adrian Tam October 12, 2021 at 12:22 am #

      If the code saved the model separately, you need to load it separately. I believe this should not be a difficult task to write a function that loads each model one by one and return them all in one shot. What do you think?

  110. Avatar
    Katy Huang October 13, 2021 at 6:05 pm #

    Thanks for this amazing tutorial!

    I want to ask the code about generating real class labels in the function generate_real_samples.

    If an RGB image as a numpy array has the shape of (number of images, width, height, bands), why is the number of color bands just set for real class labels is one? Isn’t the image RGB mode?
    Also, if the color band is set to be one, then the input and labels won’t have the same shape.

    The same question is in the generating fake samples section. Thanks!

    • Adrian Tam
      Adrian Tam October 14, 2021 at 4:24 am #

      That function is used with the discriminator model. That shape is what we need to be so it can fit the output layer. If you change the output layer of the d_model, you would change that shape as well.

  111. Avatar
    anna October 14, 2021 at 10:35 pm #

    Hi Jason. I see that in the training loop you prefer to use:

    d_loss1 = d_model.train_on_batch([X_realA, X_realB], y_real)
    d_loss2 = d_model.train_on_batch([X_realA, X_fakeB], y_fake)

    Could you please explain to me why it works?

    More specifically, I understand that we need the sum of d_loss1 and d_loss2. But I think this method gives firstly the d_loss1( then update the weights) and then the d_loss2( an then update the weights again). So this is not a ‘common’ loss function(==d_loss1+d_loss2).

    • Adrian Tam
      Adrian Tam October 20, 2021 at 7:06 am #

      Yes, that’s for illustration purpose. But indeed, given the weight update in each iteration is not supposed to be big, your concern should not be very pronounced. You may also consider shuffling the X_realB and X_fakeB and call train_on_batch() once. But then I can’t show the performance in the print() statement a few lines after.

  112. Avatar
    ertugrul November 13, 2021 at 5:04 am #

    First of all, thank you very much for posting this tutorial, So, with which method did you get the images side by side?

    • Adrian Tam
      Adrian Tam November 14, 2021 at 2:38 pm #

      I think you mean the picture at the beginning of this post. It is how the picture from original dataset looks like, “each with the target size of 256×512 pixels”.

      • Avatar
        ertugrul November 20, 2021 at 8:10 pm #

        which program did you use creating this datasets? arcgis? photoshop?

        • Adrian Tam
          Adrian Tam November 21, 2021 at 7:51 am #

          That’s from the original paper.

  113. Avatar
    Gary Peng December 13, 2021 at 3:28 am #

    Hi Jason,

    Thanks for your tutorial.
    I have tried to use this model to do RGB to IR(infrared) image translation, however, the generated images have some white spots artifacts on the picture.
    Like this discussion: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/411
    Someone told me that this may becuase the preprocessing of training and testing data are not the same, but I did the same of both data.(scale from [0,255] to [-1,1], and scale from [-1,1] to [0,1])
    Do you have any suggestion about this issue?
    Thank you very much.

    • Adrian Tam
      Adrian Tam December 15, 2021 at 5:59 am #

      Not sure – but if that’s preprocessing issue, maybe you can try to narrow down the scale from [0,1] to [0.1,0.9] so you get some margin if your model overshoots.

  114. Avatar
    Gary Peng December 13, 2021 at 11:31 pm #

    Hi Jason,
    Thanks for the tutorial.
    I have tried to use this model to do RGB to IR(infrared) image translation.
    But there are some white spots in my generated images, like the situation in this discussion: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/411
    Someone told me that this might because the preprocessing of training data and testing data are different, but I did the same for both.(scale [0, 255] to [-1, 1] and [-1, 1] to [0, 1])
    Do you have any suggestion on this?
    Thank you.

  115. Avatar
    aji January 5, 2022 at 5:41 pm #

    Thank you so much for this tutorial!! I want to use this code to generate images with single band. Can I use the same code by just changing the number of bands to generate images with 1 band? My source and target are single band images. Is it possible to generate single band images when the source is multi band image?

  116. Avatar
    Hind AlDabagh January 14, 2022 at 8:35 am #


    Thanks for this amazing tutorial.
    I don’t understand how The define_discriminator() function implements the 70×70 PatchGAN discriminator. IS there any tutorial to understand the math behind it your parameters.

    • Avatar
      James Carmichael January 14, 2022 at 8:48 am #

      Hello Hind…Thank you for the feedback! The example presented in the tutorial is based largely upon the paper below:


  117. Avatar
    Aditi February 9, 2022 at 12:09 am #

    Hello, I am looking for the modifications needs to be made in the code to apply this pix2pix gan on frequency components of the image.

    Do you have any suggestions?

    Thank You

  118. Avatar
    Brock February 11, 2022 at 2:53 am #

    Hi Jason, thanks so much for this tutorial. I’m wondering if there is a way to constrain the output image to only black and white? I’m training a model in which the resulting output only needs to be a B&W alpha-like image and I thought maybe it would train a lot faster if it only has to produce a binary output with 1 bit pixels. Any guidance is much appreciated!

  119. Avatar
    Wolfgang Meyers February 12, 2022 at 7:35 am #

    Thanks so much for posting this, it was exactly what I was looking for. I kept getting results for image classification and style transfer, when I really want to train something to apply a specific kind of transformation to images.