How to Implement GAN Hacks in Keras to Train Stable Models

By Jason Brownlee on July 12, 2019 in Generative Adversarial Networks 45

Generative Adversarial Networks, or GANs, are challenging to train.

This is because the architecture involves both a generator and a discriminator model that compete in a zero-sum game. It means that improvements to one model come at the cost of a degrading of performance in the other model. The result is a very unstable training process that can often lead to failure, e.g. a generator that generates the same image all the time or generates nonsense.

As such, there are a number of heuristics or best practices (called “GAN hacks“) that can be used when configuring and training your GAN models. These heuristics are been hard won by practitioners testing and evaluating hundreds or thousands of combinations of configuration operations on a range of problems over many years.

Some of these heuristics can be challenging to implement, especially for beginners.

Further, some or all of them may be required for a given project, although it may not be clear which subset of heuristics should be adopted, requiring experimentation. This means a practitioner must be ready to implement a given heuristic with little notice.

In this tutorial, you will discover how to implement a suite of best practices or GAN hacks that you can copy-and-paste directly into your GAN project.

After reading this tutorial, you will know:

The best sources for practical heuristics or hacks when developing generative adversarial networks.
How to implement seven best practices for the deep convolutional GAN model architecture from scratch.
How to implement four additional best practices from Soumith Chintala’s GAN Hacks presentation and list.

Kick-start your project with my new book Generative Adversarial Networks with Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

How to Implement Hacks to Train Stable Generative Adversarial Networks
Photo by BLM Nevada, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

Heuristics for Training Stable GANs
Best Practices for Deep Convolutional GANs
1. Downsample Using Strided Convolutions
2. Upsample Using Strided Convolutions
3. Use LeakyReLU
4. Use Batch Normalization
5. Use Gaussian Weight Initialization
6. Use Adam Stochastic Gradient Descent
7. Scale Images to the Range [-1,1]
Soumith Chintala’s GAN Hacks
1. Use a Gaussian Latent Space
2. Separate Batches of Real and Fake Images
3. Use Label Smoothing
4. Use Noisy Labels

Heuristics for Training Stable GANs

GANs are difficult to train.

At the time of writing, there is no good theoretical foundation as to how to design and train GAN models, but there is established literature of heuristics, or “hacks,” that have been empirically demonstrated to work well in practice.

As such, there are a range of best practices to consider and implement when developing a GAN model.

Perhaps the two most important sources of suggested configuration and training parameters are:

Alec Radford, et al’s 2015 paper that introduced the DCGAN architecture.
Soumith Chintala’s 2016 presentation and associated “GAN Hacks” list.

In this tutorial, we will explore how to implement the most important best practices from these two sources.

Best Practices for Deep Convolutional GANs

Perhaps one of the most important steps forward in the design and training of stable GAN models was the 2015 paper by Alec Radford, et al. titled “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.”

In the paper, they describe the Deep Convolutional GAN, or DCGAN, approach to GAN development that has become the de facto standard.

We will look at how to implement seven best practices for the DCGAN model architecture in this section.

1. Downsample Using Strided Convolutions

The discriminator model is a standard convolutional neural network model that takes an image as input and must output a binary classification as to whether it is real or fake.

It is standard practice with deep convolutional networks to use pooling layers to downsample the input and feature maps with the depth of the network.

This is not recommended for the DCGAN, and instead, they recommend downsampling using strided convolutions.

This involves defining a convolutional layer as per normal, but instead of using the default two-dimensional stride of (1,1) to change it to (2,2). This has the effect of downsampling the input, specifically halving the width and height of the input, resulting in output feature maps with one quarter the area.

The example below demonstrates this with a single hidden convolutional layer that uses downsampling strided convolutions by setting the ‘strides‘ argument to (2,2). The effect is the model will downsample the input from 64×64 to 32×32.

# example of downsampling with strided convolutions
from keras.models import Sequential
from keras.layers import Conv2D
# define model
model = Sequential()
model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3)))
# summarize model
model.summary()

# example of downsampling with strided convolutions

from keras.models import Sequential

from keras.layers import Conv2D

# define model

model = Sequential()

model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3)))

# summarize model

model.summary()

Running the example shows the shape of the output of the convolutional layer, where the feature maps have one quarter of the area.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 32, 32, 64)        1792
=================================================================
Total params: 1,792
Trainable params: 1,792
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_1 (Conv2D) (None, 32, 32, 64) 1792

=================================================================

Total params: 1,792

Trainable params: 1,792

Non-trainable params: 0

_________________________________________________________________

2. Upsample Using Strided Convolutions

The generator model must generate an output image given as input at a random point from the latent space.

The recommended approach for achieving this is to use a transpose convolutional layer with a strided convolution. This is a special type of layer that performs the convolution operation in reverse. Intuitively, this means that setting a stride of 2×2 will have the opposite effect, upsampling the input instead of downsampling it in the case of a normal convolutional layer.

By stacking a transpose convolutional layer with strided convolutions, the generator model is able to scale a given input to the desired output dimensions.

The example below demonstrates this with a single hidden transpose convolutional layer that uses upsampling strided convolutions by setting the ‘strides‘ argument to (2,2).

The effect is the model will upsample the input from 64×64 to 128×128.

# example of upsampling with strided convolutions
from keras.models import Sequential
from keras.layers import Conv2DTranspose
# define model
model = Sequential()
model.add(Conv2DTranspose(64, kernel_size=(4,4), strides=(2,2), padding='same', input_shape=(64,64,3)))
# summarize model
model.summary()

# example of upsampling with strided convolutions

from keras.models import Sequential

from keras.layers import Conv2DTranspose

# define model

model = Sequential()

model.add(Conv2DTranspose(64, kernel_size=(4,4), strides=(2,2), padding='same', input_shape=(64,64,3)))

# summarize model

model.summary()

Running the example shows the shape of the output of the convolutional layer, where the feature maps have quadruple the area.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_transpose_1 (Conv2DTr (None, 128, 128, 64)      3136
=================================================================
Total params: 3,136
Trainable params: 3,136
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_transpose_1 (Conv2DTr (None, 128, 128, 64) 3136

=================================================================

Total params: 3,136

Trainable params: 3,136

Non-trainable params: 0

_________________________________________________________________

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

3. Use LeakyReLU

The rectified linear activation unit, or ReLU for short, is a simple calculation that returns the value provided as input directly, or the value 0.0 if the input is 0.0 or less.

It has become a best practice when developing deep convolutional neural networks generally.

The best practice for GANs is to use a variation of the ReLU that allows some values less than zero and learns where the cut-off should be in each node. This is called the leaky rectified linear activation unit, or LeakyReLU for short.

A negative slope can be specified for the LeakyReLU and the default value of 0.2 is recommended.

Originally, ReLU was recommend for use in the generator model and LeakyReLU was recommended for use in the discriminator model, although more recently, the LeakyReLU is recommended in both models.

The example below demonstrates using the LeakyReLU with the default slope of 0.2 after a convolutional layer in a discriminator model.

# example of using leakyrelu in a discriminator model
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import BatchNormalization
from keras.layers import LeakyReLU
# define model
model = Sequential()
model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3)))
model.add(LeakyReLU(0.2))
# summarize model
model.summary()

# example of using leakyrelu in a discriminator model

from keras.models import Sequential

from keras.layers import Conv2D

from keras.layers import BatchNormalization

from keras.layers import LeakyReLU

# define model

model = Sequential()

model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3)))

model.add(LeakyReLU(0.2))

# summarize model

model.summary()

Running the example demonstrates the structure of the model with a single convolutional layer followed by the activation layer.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 32, 32, 64)        1792
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 32, 32, 64)        0
=================================================================
Total params: 1,792
Trainable params: 1,792
Non-trainable params: 0
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_1 (Conv2D) (None, 32, 32, 64) 1792

_________________________________________________________________

leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 64) 0

=================================================================

Total params: 1,792

Trainable params: 1,792

Non-trainable params: 0

_________________________________________________________________

4. Use Batch Normalization

Batch normalization standardizes the activations from a prior layer to have a zero mean and unit variance. This has the effect of stabilizing the training process.

Batch normalization is used after the activation of convolution and transpose convolutional layers in the discriminator and generator models respectively.

It is added to the model after the hidden layer, but before the activation, such as LeakyReLU.

The example below demonstrates adding a Batch Normalization layer after a Conv2D layer in a discriminator model but before the activation.

# example of using batch norm in a discriminator model
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import BatchNormalization
from keras.layers import LeakyReLU
# define model
model = Sequential()
model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3)))
model.add(BatchNormalization())
model.add(LeakyReLU(0.2))
# summarize model
model.summary()

# example of using batch norm in a discriminator model

from keras.models import Sequential

from keras.layers import Conv2D

from keras.layers import BatchNormalization

from keras.layers import LeakyReLU

# define model

model = Sequential()

model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3)))

model.add(BatchNormalization())

model.add(LeakyReLU(0.2))

# summarize model

model.summary()

Running the example shows the desired usage of batch norm between the outputs of the convolutional layer and the activation function.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 32, 32, 64)        1792
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 64)        256
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 32, 32, 64)        0
=================================================================
Total params: 2,048
Trainable params: 1,920
Non-trainable params: 128
_________________________________________________________________

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d_1 (Conv2D) (None, 32, 32, 64) 1792

_________________________________________________________________

batch_normalization_1 (Batch (None, 32, 32, 64) 256

_________________________________________________________________

leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 64) 0

=================================================================

Total params: 2,048

Trainable params: 1,920

Non-trainable params: 128

_________________________________________________________________

5. Use Gaussian Weight Initialization

Before a neural network can be trained, the model weights (parameters) must be initialized to small random variables.

The best practice for DCAGAN models reported in the paper is to initialize all weights using a zero-centered Gaussian distribution (the normal or bell-shaped distribution) with a standard deviation of 0.02.

The example below demonstrates defining a random Gaussian weight initializer with a mean of 0 and a standard deviation of 0.02 for use in a transpose convolutional layer in a generator model.

The same weight initializer instance could be used for each layer in a given model.

# example of gaussian weight initialization in a generator model
from keras.models import Sequential
from keras.layers import Conv2DTranspose
from keras.initializers import RandomNormal
# define model
model = Sequential()
init = RandomNormal(mean=0.0, stddev=0.02)
model.add(Conv2DTranspose(64, kernel_size=(4,4), strides=(2,2), padding='same', kernel_initializer=init, input_shape=(64,64,3)))

# example of gaussian weight initialization in a generator model

from keras.models import Sequential

from keras.layers import Conv2DTranspose

from keras.initializers import RandomNormal

# define model

model = Sequential()

init = RandomNormal(mean=0.0, stddev=0.02)

model.add(Conv2DTranspose(64, kernel_size=(4,4), strides=(2,2), padding='same', kernel_initializer=init, input_shape=(64,64,3)))

6. Use Adam Stochastic Gradient Descent

Stochastic gradient descent, or SGD for short, is the standard algorithm used to optimize the weights of convolutional neural network models.

There are many variants of the training algorithm. The best practice for training DCGAN models is to use the Adam version of stochastic gradient descent with the learning rate of 0.0002 and the beta1 momentum value of 0.5 instead of the default of 0.9.

The Adam optimization algorithm with this configuration is recommended when both optimizing the discriminator and generator models.

The example below demonstrates configuring the Adam stochastic gradient descent optimization algorithm for training a discriminator model.

# example of using adam when training a discriminator model
from keras.models import Sequential
from keras.layers import Conv2D
from keras.optimizers import Adam
# define model
model = Sequential()
model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3)))
# compile model
opt = Adam(lr=0.0002, beta_1=0.5)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

# example of using adam when training a discriminator model

from keras.models import Sequential

from keras.layers import Conv2D

from keras.optimizers import Adam

# define model

model = Sequential()

model.add(Conv2D(64, kernel_size=(3,3), strides=(2,2), padding='same', input_shape=(64,64,3)))

# compile model

opt = Adam(lr=0.0002, beta_1=0.5)

model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

7. Scale Images to the Range [-1,1]

It is recommended to use the hyperbolic tangent activation function as the output from the generator model.

As such, it is also recommended that real images used to train the discriminator are scaled so that their pixel values are in the range [-1,1]. This is so that the discriminator will always receive images as input, real and fake, that have pixel values in the same range.

Typically, image data is loaded as a NumPy array such that pixel values are 8-bit unsigned integer (uint8) values in the range [0, 255].

First, the array must be converted to floating point values, then rescaled to the required range.

The example below provides a function that will appropriately scale a NumPy array of loaded image data to the required range of [-1,1].

# example of a function for scaling images

# scale image data from [0,255] to [-1,1]
def scale_images(images):
	# convert from unit8 to float32
	images = images.astype('float32')
	# scale from [0,255] to [-1,1]
	images = (images - 127.5) / 127.5
	return images

# example of a function for scaling images

# scale image data from [0,255] to [-1,1]

def scale_images(images):

# convert from unit8 to float32

images = images.astype('float32')

# scale from [0,255] to [-1,1]

images = (images - 127.5) / 127.5

return images

Soumith Chintala’s GAN Hacks

Soumith Chintala, one of the co-authors of the DCGAN paper, made a presentation at NIPS 2016 titled “How to Train a GAN?” summarizing many tips and tricks.

The video is available on YouTube and is highly recommended. A summary of the tips is also available as a GitHub repository titled “How to Train a GAN? Tips and tricks to make GANs work.”

The tips draw upon the suggestions from the DCGAN paper as well as elsewhere.

In this section, we will review how to implement four additional GAN best practices not covered in the previous section.

1. Use a Gaussian Latent Space

The latent space defines the shape and distribution of the input to the generator model used to generate new images.

The DCGAN recommends sampling from a uniform distribution, meaning that the shape of the latent space is a hypercube.

The more recent best practice is to sample from a standard Gaussian distribution, meaning that the shape of the latent space is a hypersphere, with a mean of zero and a standard deviation of one.

The example below demonstrates how to generate 500 random Gaussian points from a 100-dimensional latent space that can be used as input to a generator model; each point could be used to generate an image.

# example of sampling from a gaussian latent space
from numpy.random import randn

# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples):
	# generate points in the latent space
	x_input = randn(latent_dim * n_samples)
	# reshape into a batch of inputs for the network
	x_input = x_input.reshape((n_samples, latent_dim))
	return x_input

# size of latent space
n_dim = 100
# number of samples to generate
n_samples = 500
# generate samples
samples = generate_latent_points(n_dim, n_samples)
# summarize
print(samples.shape, samples.mean(), samples.std())

# example of sampling from a gaussian latent space

from numpy.random import randn

# generate points in latent space as input for the generator

def generate_latent_points(latent_dim, n_samples):

# generate points in the latent space

x_input = randn(latent_dim * n_samples)

# reshape into a batch of inputs for the network

x_input = x_input.reshape((n_samples, latent_dim))

return x_input

# size of latent space

n_dim = 100

# number of samples to generate

n_samples = 500

# generate samples

samples = generate_latent_points(n_dim, n_samples)

# summarize

print(samples.shape, samples.mean(), samples.std())

Running the example summarizes the generation of 500 points, each comprised of 100 random Gaussian values with a mean close to zero and a standard deviation close to 1, e.g. a standard Gaussian distribution.

(500, 100) -0.004791256735601787 0.9976912528950904

1	(500, 100) -0.004791256735601787 0.9976912528950904

2. Separate Batches of Real and Fake Images

The discriminator model is trained using stochastic gradient descent with mini-batches.

The best practice is to update the discriminator with separate batches of real and fake images rather than combining real and fake images into a single batch.

This can be achieved by updating the model weights for the discriminator model with two separate calls to the train_on_batch() function.

The code snippet below demonstrates how you can do this within the inner loop of code when training your discriminator model.

...
# get randomly selected 'real' samples
X_real, y_real = ...
# update discriminator model weights
discriminator.train_on_batch(X_real, y_real)
# generate 'fake' examples
X_fake, y_fake = ...
# update discriminator model weights
discriminator.train_on_batch(X_fake, y_fake)

...

# get randomly selected 'real' samples

X_real, y_real = ...

# update discriminator model weights

discriminator.train_on_batch(X_real, y_real)

# generate 'fake' examples

X_fake, y_fake = ...

# update discriminator model weights

discriminator.train_on_batch(X_fake, y_fake)

3. Use Label Smoothing

It is common to use the class label 1 to represent real images and class label 0 to represent fake images when training the discriminator model.

These are called hard labels, as the label values are precise or crisp.

It is a good practice to use soft labels, such as values slightly more or less than 1.0 or slightly more than 0.0 for real and fake images respectively, where the variation for each image is random.

This is often referred to as label smoothing and can have a regularizing effect when training the model.

The example below demonstrates defining 1,000 labels for the positive class (class=1) and smoothing the label values uniformly into the range [0.7,1.2] as recommended.

# example of positive label smoothing
from numpy import ones
from numpy.random import random

# example of smoothing class=1 to [0.7, 1.2]
def smooth_positive_labels(y):
	return y - 0.3 + (random(y.shape) * 0.5)


# generate 'real' class labels (1)
n_samples = 1000
y = ones((n_samples, 1))
# smooth labels
y = smooth_positive_labels(y)
# summarize smooth labels
print(y.shape, y.min(), y.max())

# example of positive label smoothing

from numpy import ones

from numpy.random import random

# example of smoothing class=1 to [0.7, 1.2]

def smooth_positive_labels(y):

return y - 0.3 + (random(y.shape) * 0.5)

# generate 'real' class labels (1)

n_samples = 1000

y = ones((n_samples, 1))

# smooth labels

y = smooth_positive_labels(y)

# summarize smooth labels

print(y.shape, y.min(), y.max())

Running the example summarizes the min and max values for the smooth values, showing they are close to the expected values.

(1000, 1) 0.7003103006957805 1.1997858934066357

1	(1000, 1) 0.7003103006957805 1.1997858934066357

There have been some suggestions that only positive-class label smoothing is required and to values less than 1.0. Nevertheless, you can also smooth negative class labels.

The example below demonstrates generating 1,000 labels for the negative class (class=0) and smoothing the label values uniformly into the range [0.0, 0.3] as recommended.

# example of negative label smoothing
from numpy import zeros
from numpy.random import random

# example of smoothing class=0 to [0.0, 0.3]
def smooth_negative_labels(y):
	return y + random(y.shape) * 0.3

# generate 'fake' class labels (0)
n_samples = 1000
y = zeros((n_samples, 1))
# smooth labels
y = smooth_negative_labels(y)
# summarize smooth labels
print(y.shape, y.min(), y.max())

# example of negative label smoothing

from numpy import zeros

from numpy.random import random

# example of smoothing class=0 to [0.0, 0.3]

def smooth_negative_labels(y):

return y + random(y.shape) * 0.3

# generate 'fake' class labels (0)

n_samples = 1000

y = zeros((n_samples, 1))

# smooth labels

y = smooth_negative_labels(y)

# summarize smooth labels

print(y.shape, y.min(), y.max())

4. Use Noisy Labels

The labels used when training the discriminator model are always correct.

This means that fake images are always labeled with class 0 and real images are always labeled with class 1.

It is recommended to introduce some errors to these labels where some fake images are marked as real, and some real images are marked as fake.

If you are using separate batches to update the discriminator for real and fake images, this may mean randomly adding some fake images to the batch of real images, or randomly adding some real images to the batch of fake images.

If you are updating the discriminator with a combined batch of real and fake images, then this may involve randomly flipping the labels on some images.

The example below demonstrates this by creating 1,000 samples of real (class=1) labels and flipping them with a 5% probability, then doing the same with 1,000 samples of fake (class=0) labels.

# example of noisy labels
from numpy import ones
from numpy import zeros
from numpy.random import choice

# randomly flip some labels
def noisy_labels(y, p_flip):
	# determine the number of labels to flip
	n_select = int(p_flip * y.shape[0])
	# choose labels to flip
	flip_ix = choice([i for i in range(y.shape[0])], size=n_select)
	# invert the labels in place
	y[flip_ix] = 1 - y[flip_ix]
	return y

# generate 'real' class labels (1)
n_samples = 1000
y = ones((n_samples, 1))
# flip labels with 5% probability
y = noisy_labels(y, 0.05)
# summarize labels
print(y.sum())

# generate 'fake' class labels (0)
y = zeros((n_samples, 1))
# flip labels with 5% probability
y = noisy_labels(y, 0.05)
# summarize labels
print(y.sum())

# example of noisy labels

from numpy import ones

from numpy import zeros

from numpy.random import choice

# randomly flip some labels

def noisy_labels(y, p_flip):

# determine the number of labels to flip

n_select = int(p_flip * y.shape[0])

# choose labels to flip

flip_ix = choice([i for i in range(y.shape[0])], size=n_select)

# invert the labels in place

y[flip_ix] = 1 - y[flip_ix]

return y

# generate 'real' class labels (1)

n_samples = 1000

y = ones((n_samples, 1))

# flip labels with 5% probability

y = noisy_labels(y, 0.05)

# summarize labels

print(y.sum())

# generate 'fake' class labels (0)

y = zeros((n_samples, 1))

# flip labels with 5% probability

y = noisy_labels(y, 0.05)

# summarize labels

print(y.sum())

Try running the example a few times.

The results show that approximately 50 “1”s are flipped to 1s for the positive labels (e.g. 5% of 1,0000) and approximately 50 “0”s are flopped to 1s in for the negative labels.

950.049.0

950.049.0

Summary

In this tutorial, you discovered how to implement a suite of best practices or GAN hacks that you can copy-and-paste directly into your GAN project.

Specifically, you learned:

The best sources for practical heuristics or hacks when developing generative adversarial networks.
How to implement seven best practices for the deep convolutional GAN model architecture from scratch.
How to implement four additional best practices from Soumith Chintala’s GAN Hacks presentation and list.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

45 Responses to How to Implement GAN Hacks in Keras to Train Stable Models

sukhpal June 24, 2019 at 12:16 am #

sir is we apply GAN on numerical values used for classification instead of Images or GAN works on images

Reply
- Jason Brownlee June 24, 2019 at 6:34 am #
  
  Most of the work is on using GANs for image data.
  
  There is some use on other data, such as text and time series.
  
  Reply
  - P.G. September 15, 2020 at 7:24 am #
    
    So why using GAN for classification. Suppose we want high classification accuracy, the discriminator is then forced to train on poorly generated images as we will want to make discriminator “win the game” in order to have better classification. Im missing something or this is a bit of computational waste? If we are not training the generator we are also not using a lot of generated data (that is supposed to be the point of using SGAN) as we want to force an early stop scenario right? So Is there no way of making a good classifier AND a good generator? Can you make a good classiier with a bad generator and not call it a computational waste?? Btw thanks again for your epic work!
    
    Reply
    - Jason Brownlee September 15, 2020 at 7:44 am #
      
      GANs are appropriate for classification when all other models fail. They can be used for semi-supervised, when you have lots of unlabelled data and only a tiny amount of labelled data.
      
      The key to GANs is the adversarial training of both models. It does not work when you train one model in isolation, e.g. just the generator or just the discriminator.
      
      If you want to train a single model, you can train a classification model directly. And you should as a point of comparison and only use a GAN if it can perform better.
      
      Reply
Chad July 23, 2019 at 4:35 am #

Such a helpful post! Applied them to the Generating Dog Images kaggle competition. Although I didn’t get perfect looking dogs, these GAN hacks helped improve significantly.

Thank you!

Reply
- Jason Brownlee July 23, 2019 at 8:13 am #
  
  Well done, very cool!
  
  Reply
Chad July 31, 2019 at 4:46 am #

Hey Jason,

I am wondering if there is a recommended batch size for performance improvements. Currently I compared 32 and 256. Batch size of 32 took longer to train and gave worse results than batch size 256. Is there a reasoning behind this?

Reply
- Jason Brownlee July 31, 2019 at 6:58 am #
  
  Great question!
  
  Smallish seems better on most problems, e.g. 32, 64.
  
  Too much seems to cause one model to learn faster than the other, e.g. it becomes unstable and you get a failure mode.
  
  Reply
  - Chad July 31, 2019 at 7:51 am #
    
    I see, thank you. But does smaller mean longer training time for each epoch?
    
    Reply
    - Jason Brownlee July 31, 2019 at 2:05 pm #
      
      Yes.
      
      Reply
Masayo August 21, 2019 at 2:39 pm #

Question on 3. Use Label Smoothing.

Won’t this affect the loss function if the value is above 1?
I am assuming something like cross-entropy will give different results?

As for Use Noisy Labels, any intution as to why this is better?

Reply
- Jason Brownlee August 22, 2019 at 6:19 am #
  
  The idea is to manipulate how the loss function sees the problem, to bias it.
  
  Reply
- Gledson Melotti September 3, 2019 at 8:10 am #
  
  Hello Jason? Can I use GAN to classificariam?
  
  Reply
  - Jason Brownlee September 3, 2019 at 2:06 pm #
    
    You could use the discriminator as the basis for a discriminator via transfer learning.
    
    Reply
    - Gledson Melotti September 3, 2019 at 6:46 pm #
      
      Thank you very much.
      
      Reply
      - Jason Brownlee September 4, 2019 at 5:56 am #
        
        You’re welcome.
Jason Salas August 24, 2019 at 3:51 pm #

Hi Jason,

GREAT article! I’m curious about the smoothing of positive class labels, using the structure in your e-book of the train() method. When applying this technique in the generate_real_samples() helper method, should the smoothing also be used in the train() method when creating inverted labels for the generator to encourage better output images?

It seems logical. Your thoughts?

Thanks for the help! (And your e-book is fantastic, btw!)

Reply
- Jason Brownlee August 25, 2019 at 6:33 am #
  
  Good question.
  
  Hmmm. Probably.
  
  Perhaps try with and without and confirm.
  
  Reply
  - Jason Salas August 25, 2019 at 9:01 am #
    
    I just ran it on a couple of experiments, and it seems to regularize a tad. Cool!
    
    Reply
    - Jason Brownlee August 26, 2019 at 6:05 am #
      
      Nice work!
      
      Reply
Fraask December 4, 2019 at 1:23 am #

When I am applying label smoothing the accuracy of the discriminator on real samples is equal to zero. Any ideas why this is the case?

Reply
- Jason Brownlee December 4, 2019 at 5:38 am #
  
  Ignore the accuracy and focus on the loss, specifically learning curves.
  
  Also focus on the images that are periodically generated, say every few epochs.
  
  Reply
  - Max December 2, 2020 at 12:55 am #
    
    I have the same issue and am wondering why this is the case; Maybe its some fundamental thing i am missing?
    
    Jason do you have any idea why this is the case?
    
    Reply
Dang Tuan Hoang December 18, 2019 at 8:27 pm #

Hi Jason, thank for the great article !
Is it recommended to use both label smoothing and noisy label?
Also, i read that Soumith (https://github.com/soumith/ganhacks) recommends using SGD for Discriminator and Adam for Generator. What is your thought on this?

Reply
- Jason Brownlee December 19, 2019 at 6:29 am #
  
  Perhaps try it on your specific GAN model.
  
  I have not found a need to use either.
  
  Reply
Siddarth Venkateswaran May 8, 2020 at 12:18 pm #

Hi Jason, great blog as usual.

Can these principles be applied to training autoencoders as well? As in:
1)ReLU for encoder layers and LeakyReLU for decoder layers
2) ‘tanh’ activation function for the last layer of the decoder network etc.

Reply
- Jason Brownlee May 8, 2020 at 1:04 pm #
  
  Generally, no. The tips focus on GANs and how unstable the training process is.
  
  Autoencoders are super stable to train.
  
  Reply
Ujjayant Sinha July 3, 2020 at 9:31 pm #

Does a shallow discriminator network fare better than a deeper one in terms of mode collapse and stability ? E.g- the discriminator in your pix2pix post vs one with more layers.

Also, this post was very helpful. Thank you.

Reply
- Jason Brownlee July 4, 2020 at 5:59 am #
  
  It really depends on the configuration of the model and the specifics of the dataset.
  
  Reply
  - Ujjayant Sinha July 16, 2020 at 12:04 am #
    
    A deeper model worked in my case.
    Also, if we flip labels, it would lead the discriminator to bias/be confused and thus, prevent the generator from creating the same set of features to fool the discriminator. Am I correct about this intuition behind flipping some of the labels (4. Use noisy labels) ?
    
    Reply
    - Jason Brownlee July 16, 2020 at 6:41 am #
      
      A little noise is good – keeps the model learning, a lot of noise is confusing.
      
      Reply
Juan August 27, 2020 at 2:33 am #

Hi Jason, Great tutorials!!!

I just acquired your book and i’m playing a little bit with the hacks you recommended. Running the complete code just as you delivered got the following learning curves (not pretty good):

https://drive.google.com/file/d/1C6rJP0NfynCpyV5THhfz_INphLH6E3Pn/view?usp=sharing

Then i changed it to use SGD on the discriminator and ReLU on the generator with no particular improvements:
https://drive.google.com/file/d/1uv-u8dePjApo8gC1N-bipfbUAwUbNfm_/view?usp=sharing

What got the best boost was using label smoothing and noisy labels (with previous changes):
1. Using only label smoothing
https://drive.google.com/file/d/1bbUkzODyF6m_qb1IinO_bt9dkRJWVEj5/view?usp=sharing

2. Using both Label smoothing and Noisy labels
https://drive.google.com/file/d/118b72LaAuCtq9zfmofIBGyPIXxUkb1Ke/view?usp=sharing

But still the generator loss keeps converging towards zero, making myself think that probably is fooling the discriminator with garbage somehow, is this possible?. I think the generated images of this last setting confirm my beliefs:

https://drive.google.com/file/d/1mCOpn7Vj7-PqERXY9tHnfIpSoLkyo4xU/view?usp=sharing

Even though you see something in the middle trying to be an eight it’s still super noisy in my opinion and even repetitive.

I’m still struggling to raise the generator loss over 1.0 as you suggest, any thoughts?

Reply
- Juan August 27, 2020 at 2:39 am #
  
  I think the discriminator is failing to detect fake samples more than it fails to detect real samples, that’s maybe why is easy for the generator to fool the discriminator with garbage, maybe applying noise only on one of the batches (real or fake) could help. I assume both discriminator losses should be kind of similar
  
  Reply
- Jason Brownlee August 27, 2020 at 6:23 am #
  
  Nice progress Juan!
  
  Perhaps the models are too small/not powerful enough for your dataset?
  
  Try scaling them up a lot, or taking working model architectures from another similar project as a starting point?
  
  Perhaps try reducing your dataset (size or complexity) to see if you can get something started.
  
  Try things to learn more about the cause of the difficulty.
  
  Let me know how you go.
  
  Reply
  - Juan August 27, 2020 at 7:56 am #
    
    I forgot to said i’m using the MNIST (digit 8) dataset for all the experiments, sorry, i had some mistakes generating the noisy labels, but i corrected them and i got this result:
    
    https://drive.google.com/file/d/1fTvUrWsaddvxM8gx6aM24gCVAkvF1arg/view?usp=sharing
    
    Discriminator loss on fake samples is still way above the “acceptable value” for MNIST example, and generator still going towards zero, i will try to move and tune some hyperparameters, or maybe i will try to use a less complex number such as 1 to see if i can get better results
    
    Thanks
    
    Reply
    - Jason Brownlee August 27, 2020 at 1:34 pm #
      
      Perhaps ignore accuracy for now, focus on loss and the quality of the generated images.
      
      Reply
Kadd September 21, 2020 at 1:33 pm #

Great article!
I would like to know if the smooth and noisy labels should also be used for generated images labels when training the generator?

Reply
- Jason Brownlee September 21, 2020 at 2:37 pm #
  
  Noisy labels can help in some cases. Perhaps try it with your model and compare the results.
  
  Reply
Zohre November 17, 2020 at 3:28 am #

Hi Jason,
Thank you for your great tutorials. I read all your tutorials about GANs. I want to use GAN for data augmentation for my thesis. I found learning rates which discriminator and generator converged to around 0.7, 0.8 . But after many iterations generator does not produce good result. It just produce the global structure without any details of the images. I can’t understand What does it mean?

Reply
- Jason Brownlee November 17, 2020 at 6:33 am #
  
  GAN’s don’t converge:
  https://machinelearningmastery.com/faq/single-faq/why-is-my-gan-not-converging
  
  Perhaps try tuning your model architecture?
  
  Perhaps try some gan hacks:
  https://machinelearningmastery.com/how-to-code-generative-adversarial-network-hacks/
  
  Reply
Dhruvam Panchal June 30, 2021 at 3:48 pm #

Hi Jason,

I am working on AnoVAEGAN for a project and your resources have been very helpful.

However, I have noticed that after around 60-70 epochs, the generator and discriminator gain equilibrium in the loss value. And the result is not as expected, I want the generator to improve further. (Give a more sharper and detailed image)

Would adding noise to the discriminator input image and label smoothing help? Are there any more tricks you would suggest trying?

Reply
- Jason Brownlee July 1, 2021 at 5:00 am #
  
  Perhaps experiment with some of the methods above and discover what works well in your specific case.
  
  Reply
Divya Bhatia September 18, 2021 at 6:20 am #

Hi Jason,
I am working on multiclass classification problem(4 classes) and I am unable to understand how I should apply label smoothing on one hot encoded label e.g. 1 0 0 0 should become what?

Reply
- Adrian Tam September 19, 2021 at 6:24 am #
  
  For example, [1.3, 0.2, 0.14, 0.19]
  Simply add a very small random number to it would work.
  
  Reply
Ahmed Gamal Habashy June 5, 2022 at 5:05 pm #

Hi jason,
your articles are great. I have a question with batch-normalize
whenever I add batch-normalize, the results get worse and the model fails to converge !!!!
and I have found this question repeated without an accepted solution
(the implementation with python 3.9 – sypder )

Reply

Navigation

How to Implement GAN Hacks in Keras to Train Stable Models

Tutorial Overview

Heuristics for Training Stable GANs

Best Practices for Deep Convolutional GANs

1. Downsample Using Strided Convolutions

2. Upsample Using Strided Convolutions

Want to Develop GANs from Scratch?

3. Use LeakyReLU

4. Use Batch Normalization

5. Use Gaussian Weight Initialization

6. Use Adam Stochastic Gradient Descent

7. Scale Images to the Range [-1,1]

Soumith Chintala’s GAN Hacks

1. Use a Gaussian Latent Space

2. Separate Batches of Real and Fake Images

3. Use Label Smoothing

4. Use Noisy Labels

Further Reading

Papers

API

Articles

Summary

Develop Generative Adversarial Networks Today!

Develop Your GAN Models in Minutes

Finally Bring GAN Models to your Vision Projects

More On This Topic

45 Responses to How to Implement GAN Hacks in Keras to Train Stable Models

Leave a Reply Click here to cancel reply.