A Gentle Introduction to StyleGAN the Style Generative Adversarial Network

By Jason Brownlee on May 10, 2020 in Generative Adversarial Networks 47

Generative Adversarial Networks, or GANs for short, are effective at generating large high-quality images.

Most improvement has been made to discriminator models in an effort to train more effective generator models, although less effort has been put into improving the generator models.

The Style Generative Adversarial Network, or StyleGAN for short, is an extension to the GAN architecture that proposes large changes to the generator model, including the use of a mapping network to map points in latent space to an intermediate latent space, the use of the intermediate latent space to control style at each point in the generator model, and the introduction to noise as a source of variation at each point in the generator model.

The resulting model is capable not only of generating impressively photorealistic high-quality photos of faces, but also offers control over the style of the generated image at different levels of detail through varying the style vectors and noise.

In this post, you will discover the Style Generative Adversarial Network that gives control over the style of generated synthetic images.

After reading this post, you will know:

The lack of control over the style of synthetic images generated by traditional GAN models.
The architecture of StyleGAN model that introduces control over the style of generated images at different levels of detail.
Impressive results achieved with the StyleGAN architecture when used to generate synthetic human faces.

Kick-start your project with my new book Generative Adversarial Networks with Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Gentle Introduction to Style Generative Adversarial Network (StyleGAN)
Photo by Ian D. Keating, some rights reserved.

Overview

This tutorial is divided into four parts; they are:

Lacking Control Over Synthesized Images
Control Style Using New Generator Model
What Is the StyleGAN Model Architecture
Examples of StyleGAN Generated Images

Lacking Control Over Synthesized Images

Generative adversarial networks are effective at generating high-quality and large-resolution synthetic images.

The generator model takes as input a point from latent space and generates an image. This model is trained by a second model, called the discriminator, that learns to differentiate real images from the training dataset from fake images generated by the generator model. As such, the two models compete in an adversarial game and find a balance or equilibrium during the training process.

Many improvements to the GAN architecture have been achieved through enhancements to the discriminator model. These changes are motivated by the idea that a better discriminator model will, in turn, lead to the generation of more realistic synthetic images.

As such, the generator has been somewhat neglected and remains a black box. For example, the source of randomness used in the generation of synthetic images is not well understood, including both the amount of randomness in the sampled points and the structure of the latent space.

Yet the generators continue to operate as black boxes, and despite recent efforts, the understanding of various aspects of the image synthesis process, […] is still lacking. The properties of the latent space are also poorly understood …

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

This limited understanding of the generator is perhaps most exemplified by the general lack of control over the generated images. There are few tools to control the properties of generated images, e.g. the style. This includes high-level features such as background and foreground, and fine-grained details such as the features of synthesized objects or subjects.

This requires both disentangling features or properties in images and adding controls for these properties to the generator model.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Control Style Using New Generator Model

The Style Generative Adversarial Network, or StyleGAN for short, is an extension to the GAN architecture to give control over the disentangled style properties of generated images.

Our generator starts from a learned constant input and adjusts the “style” of the image at each convolution layer based on the latent code, therefore directly controlling the strength of image features at different scales

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

The StyleGAN is an extension of the progressive growing GAN that is an approach for training generator models capable of synthesizing very large high-quality images via the incremental expansion of both discriminator and generator models from small to large images during the training process.

In addition to the incremental growing of the models during training, the style GAN changes the architecture of the generator significantly.

The StyleGAN generator no longer takes a point from the latent space as input; instead, there are two new sources of randomness used to generate a synthetic image: a standalone mapping network and noise layers.

The output from the mapping network is a vector that defines the styles that is integrated at each point in the generator model via a new layer called adaptive instance normalization. The use of this style vector gives control over the style of the generated image.

Stochastic variation is introduced through noise added at each point in the generator model. The noise is added to entire feature maps that allow the model to interpret the style in a fine-grained, per-pixel manner.

This per-block incorporation of style vector and noise allows each block to localize both the interpretation of style and the stochastic variation to a given level of detail.

The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

What Is the StyleGAN Model Architecture

The StyleGAN is described as a progressive growing GAN architecture with five modifications, each of which was added and evaluated incrementally in an ablative study.

The incremental list of changes to the generator are:

Baseline Progressive GAN.
Addition of tuning and bilinear upsampling.
Addition of mapping network and AdaIN (styles).
Removal of latent vector input to generator.
Addition of noise to each block.
Addition Mixing regularization.

The image below summarizes the StyleGAN generator architecture.

Summary of the StyleGAN Generator Model Architecture.
Taken from: A Style-Based Generator Architecture for Generative Adversarial Networks.

We can review each of these changes in more detail.

1. Baseline Progressive GAN

The StyleGAN generator and discriminator models are trained using the progressive growing GAN training method.

This means that both models start with small images, in this case, 4×4 images. The models are fit until stable, then both discriminator and generator are expanded to double the width and height (quadruple the area), e.g. 8×8.

A new block is added to each model to support the larger image size, which is faded in slowly over training. Once faded-in, the models are again trained until reasonably stable and the process is repeated with ever-larger image sizes until the desired target image size is met, such as 1024×1024.

For more on the progressive growing GAN, see the paper:

Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017.

2. Bilinear Sampling

The progressive growing GAN uses nearest neighbor layers for upsampling instead of transpose convolutional layers that are common in other generator models.

The first point of deviation in the StyleGAN is that bilinear upsampling layers are unused instead of nearest neighbor.

We replace the nearest-neighbor up/downsampling in both networks with bilinear sampling, which we implement by lowpass filtering the activations with a separable 2nd order binomial filter after each upsampling layer and before each downsampling layer.

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

3. Mapping Network and AdaIN

Next, a standalone mapping network is used that takes a randomly sampled point from the latent space as input and generates a style vector.

The mapping network is comprised of eight fully connected layers, e.g. it is a standard deep neural network.

For simplicity, we set the dimensionality of both [the latent and intermediate latent] spaces to 512, and the mapping f is implemented using an 8-layer MLP …

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

The style vector is then transformed and incorporated into each block of the generator model after the convolutional layers via an operation called adaptive instance normalization or AdaIN.

The AdaIN layers involve first standardizing the output of feature map to a standard Gaussian, then adding the style vector as a bias term.

Learned affine transformations then specialize [the intermediate latent vector] to styles y = (ys, yb) that control adaptive instance normalization (AdaIN) operations after each convolution layer of the synthesis network g.

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

Calculation of the adaptive instance normalization (AdaIN) in the StyleGAN.
Taken from: A Style-Based Generator Architecture for Generative Adversarial Networks.

The addition of the new mapping network to the architecture also results in the renaming of the generator model to a “synthesis network.”

4. Removal of Latent Point Input

The next change involves modifying the generator model so that it no longer takes a point from the latent space as input.

Instead, the model has a constant 4x4x512 constant value input in order to start the image synthesis process.

5. Addition of Noise

The output of each convolutional layer in the synthesis network is a block of activation maps.

Gaussian noise is added to each of these activation maps prior to the AdaIN operations. A different sample of noise is generated for each block and is interpreted using per-layer scaling factors.

These are single-channel images consisting of uncorrelated Gaussian noise, and we feed a dedicated noise image to each layer of the synthesis network. The noise image is broadcasted to all feature maps using learned per-feature scaling factors and then added to the output of the corresponding convolution …

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

This noise is used to introduce style-level variation at a given level of detail.

6. Mixing regularization

Mixing regularization involves first generating two style vectors from the mapping network.

A split point in the synthesis network is chosen and all AdaIN operations prior to the split point use the first style vector and all AdaIN operations after the split point get the second style vector.

… we employ mixing regularization, where a given percentage of images are generated using two random latent codes instead of one during training.

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

This encourages the layers and blocks to localize the style to specific parts of the model and corresponding level of detail in the generated image.

Examples of StyleGAN Generated Images

The StyleGAN is both effective at generating large high-quality images and at controlling the style of the generated images.

In this section, we will review some examples of generated images.

A video demonstrating the capability of the model was released by the authors of the paper, providing a useful overview.

StyleGAN Results Video, YouTube.

High-Quality Faces

The image below taken from the paper shows synthetic faces generated with the StyleGAN with the sizes 4×4, 8×8, 16×16, and 32×32.

Example of High-Quality Generated Faces Using the StyleGAN.
Taken from: A Style-Based Generator Architecture for Generative Adversarial Networks.

Varying Style by Level of Detail

The use of different style vectors at different points of the synthesis network gives control over the styles of the resulting image at different levels of detail.

For example, blocks of layers in the synthesis network at lower resolutions (e.g. 4×4 and 8×8) control high-level styles such as pose and hairstyle. Blocks of layers in the model of the network (e.g. as 16×16 and 32×32) control hairstyles and facial expression. Finally, blocks of layers closer to the output end of the network (e.g. 64×64 to 1024×1024) control color schemes and very fine details.

The image below taken from the paper shows generated images on the left and across the top. The two rows of intermediate images are examples of the style vectors used to generate the images on the left, where the style vectors used for the images on the top are used only in the lower levels. This allows the images on the left to adopt high-level styles such as pose and hairstyle from the images on the top in each column.

Copying the styles corresponding to coarse spatial resolutions (4^2 – 8^2) brings high-level aspects such as pose, general hair style, face shape, and eyeglasses from source B, while all colors (eyes, hair, lighting) and finer facial features resemble A.

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

Example of One Set of Generated Faces (Left) Adopting the Coarse Style of Another Set of Generated Faces (Top)
Taken from: A Style-Based Generator Architecture for Generative Adversarial Networks.

Use of Noise to Control Level of Detail

The authors varied the use of noise at different levels of detail in the model (e.g. fine, middle, coarse), much like the previous example of varying style.

The result is that noise gives control over the generation of detail, from broader structure when noise is used in the coarse blocks of layers to the generation of fine detail when noise is added to the layers closer to the output of the network.

We can see that the artificial omission of noise leads to featureless “painterly” look. Coarse noise causes large-scale curling of hair and appearance of larger background features, while the fine noise brings out the finer curls of hair, finer background detail, and skin pores.

— A Style-Based Generator Architecture for Generative Adversarial Networks, 2018.

Example of Varying Noise at Different Levels of the Generator Model.
Taken from: A Style-Based Generator Architecture for Generative Adversarial Networks.

Summary

In this post, you discovered the Style Generative Adversarial Network that gives control over the style of generated synthetic images.

Specifically, you learned:

The lack of control over the style of synthetic images generated by traditional GAN models.
The architecture of StyleGAN model GAN model that introduces control over the style of generated images at different levels of detail
Impressive results achieved with the StyleGAN architecture when used to generate synthetic human faces.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

47 Responses to A Gentle Introduction to StyleGAN the Style Generative Adversarial Network

V N S Rama Krishna P September 5, 2019 at 7:28 pm #

Hey Jason, I really love your work. Can you please help me by clarifying the internal architecture of AdaIN? (Just give me a brief overview). Thanks in advance.

Reply
- Jason Brownlee September 6, 2019 at 4:55 am #
  
  Thanks for the suggestion, I may cover it in the future.
  
  Reply
Saru September 20, 2019 at 8:22 pm #

How can you validate if the image is real or fake programmatically ?

Reply
- Jason Brownlee September 21, 2019 at 6:50 am #
  
  The discriminator will make this prediction.
  
  Reply
  - Ponraj S June 25, 2020 at 9:57 pm #
    
    Hi Mr.Jason could you please make a tutorial for styleGAN and style mixing using keras and tensorflow
    
    Reply
    - Jason Brownlee June 26, 2020 at 5:35 am #
      
      Thanks for the suggestion.
      
      Reply
LP January 2, 2020 at 7:48 pm #

Hey Jason, I’m appreciate your intorduction about the GANs. In the styleGan, I am confused about the latent space disentanglement? what ‘s that. Can you give us more detailed explanation

Reply
- Jason Brownlee January 3, 2020 at 7:27 am #
  
  Thanks for the suggestion.
  
  Reply
Fredrik January 23, 2020 at 12:48 am #

Thanks for the effort of the high-level explanation of many GAN-papers.

A minor question. You say “The mapping network is comprised of eight fully connected layers, e.g. it is a standard deep convolutional neural network.” I think the mapping network is a standard feed-forward network (MLP) without convolutional layers so I think there is a small mistake here?

Reply
- Jason Brownlee January 23, 2020 at 6:38 am #
  
  Thanks. Yes, looks like a typo. Fixed.
  
  Reply
  - Hashem Hashemi February 8, 2021 at 6:42 pm #
    
    Anyone know what activation is used in this MLP?
    
    Reply
    - Jason Brownlee February 9, 2021 at 6:31 am #
      
      It’s not an MLP, it’s a GAN.
      
      ReLU remain popular in GANs, you can get started here:
      https://machinelearningmastery.com/start-here/#gans
      
      Reply
      - Hashem Hashemi February 13, 2021 at 5:42 pm #
        
        The mapping network (which converts the latent space to w) is basically an MLP. From StyleGAN code it looks like they’re using Leaky-ReLU for its activations (if anyone is looking for the same).
      - Jason Brownlee February 14, 2021 at 5:04 am #
        
        Thanks.
Sokina February 9, 2020 at 9:33 pm #

Hi, thank you for good tutorial. I have a question about images that generated using sours A and sours B. For example i want using my sours of images A to generate B, for this how i can input A sours. Or the sours A and B taken from training data?

Reply
- Jason Brownlee February 10, 2020 at 6:31 am #
  
  Perhaps this will help you load your images:
  https://machinelearningmastery.com/how-to-load-and-manipulate-images-for-deep-learning-in-python-with-pil-pillow/
  
  Reply
mohammad hossein ashoori March 11, 2020 at 6:14 pm #

hello
thank you for great tutorial

i have two questions about adding input noise to output of conv layers
i understand that we generate a guassian noise for each conv layer

1-
we use different (new random samples) noise during training
or keep it same and just train B variables (per-channel scale factors)

2-
and what in inference time (using random or deterministic noise)?

Reply
- Jason Brownlee March 12, 2020 at 8:42 am #
  
  Good question, from the tutorial:
  
  The StyleGAN generator no longer takes a point from the latent space as input; instead, there are two new sources of randomness used to generate a synthetic image: a standalone mapping network and noise layers.
  
  Reply
Ayush March 27, 2020 at 12:54 am #

Can you tell me how to test custom image on stylemixing using styleGAN architecture.

Reply
- Jason Brownlee March 27, 2020 at 6:15 am #
  
  I don’t have a tutorial on this topic, perhaps in the future.
  
  Reply
Jamshid May 10, 2020 at 6:21 am #

Topic “Varying Style by Level of Detail” is duplicated, i guess second one should be “level of noise” 🙂

Reply
- Jason Brownlee May 10, 2020 at 4:07 pm #
  
  Thanks, fixed!
  
  Reply
Hasan May 14, 2020 at 11:59 pm #

Hi Jason

Thank for the article. Any chances of you showing how to implement this from scratch?

Reply
- Jason Brownlee May 15, 2020 at 6:03 am #
  
  Perhaps in the future.
  
  Reply
Josh B. May 19, 2020 at 12:43 am #

Hi Jason,

Pretty interesting article unveiling the Stylegan network. Question, if one had thousands of real aircraft trajectory data and were to generate hiperrealistic (and diverse) synthetic aircraft trajectories (altitude, latitude, longitude; multivariate time series), would you use:
1 – Wasserstein GAN with GP using LSTMs/GRUs.
2 – Try to modify the stylegan architecture to use LSTMs/GRUs cells and generate sequencies?
3 – other..

It would be great to have your opinion,
Thank you and good job!

Reply
- Jason Brownlee May 19, 2020 at 6:07 am #
  
  Thanks.
  
  For time series, I would not recommend a GAN as they are for images, I’d recommend checking the literature for generative models specific to time series.
  
  Reply
Dodger June 25, 2020 at 2:16 pm #

I’m trying to come up with a way to use a GAN to generate textures for 3D models.

Additionally, it should be possible to build 3D shapes the same way, as a 3D shape can be encoded in a 2D image using vector displacement.

Reply
- Jason Brownlee June 26, 2020 at 5:29 am #
  
  That sounds like a fun project, let me know how you go!
  
  Reply
Santosh September 3, 2020 at 1:15 pm #

Thanks for the lovely article on styleGAN. Learned a lot

Reply
- Jason Brownlee September 3, 2020 at 1:39 pm #
  
  You’re welcome.
  
  Reply
Juan Márquez October 23, 2020 at 7:50 am #

Hi Jason, are samples A and B both taken from previous fake generated images or does either one of them have to come from a real life human? I’m just curious to know if the AI could potentially create a face out of nothing if it just knew, by help of the discriminator, what a real face should look like.

Thanks,

Reply
- Jason Brownlee October 23, 2020 at 1:38 pm #
  
  Samples A and B are real photos from two different domains.
  
  Reply
Udith November 30, 2020 at 6:54 pm #

Thanks for the great tutorial. What is the “Latent Z” in styleGAN architecture and how we can obtain it. Is it sampled from Gaussian distribution as in normal GANs or obtained by feeding images to some pre-trained network?

Reply
- Jason Brownlee December 1, 2020 at 6:18 am #
  
  You’re welcome.
  
  From the tutorial “The StyleGAN generator no longer takes a point from the latent space as input”
  
  I recommend re-reading.
  
  Reply
Rick Darwin December 27, 2020 at 4:46 am #

Is it possible to direct StyleGAN to make a specific sheaf of images, eg., ‘males, about age 25’ rather than random faces?
If so, how would one specify these parameters?

Reply
- Jason Brownlee December 27, 2020 at 5:05 am #
  
  Perhaps – I believe so. It really depends on the specific framing of the problem and the training data you have available – where you can associate images with the specific input variables. A straight stylegan might not be the best fit, it might be better to use a variation that gives you trainable control variables.
  
  I don’t have much on this, you may need to dive into the literature to discover the latest.
  
  Reply
Abhijit Pal February 9, 2021 at 7:29 am #

Hello Jason, thanks for the tutorial. In the paper, the authors have mentioned that they have used a learned constant, rather than a random latent vector. Is this learned constant, a randomly initialized vector which is later updated during the training time through backpropagation(like weights of a convolution layer) and then treated as a constant during the inference time?

Reply
- Jason Brownlee February 9, 2021 at 7:49 am #
  
  Not sure off the cuff, I assume it is a learned/adapted vector input.
  
  Reply
Abhijit Pal February 28, 2021 at 10:06 pm #

Are the Learned affine transformations(before the AdaIN block) just fully connected layers, whose weights are learned over the training period or something else?

Reply
- Jason Brownlee March 1, 2021 at 5:35 am #
  
  From memory, I believe so – you can check the paper and linked code project to be sure.
  
  Reply
Tabriz Nuruyev April 23, 2021 at 6:41 pm #

Hi Jason,

I wonder if styleGAN can be used for feature extraction to be used for feature rehearsal in incremental learning scenarios? Generally, is styleGAN the best choice among variations of GANs for feature extraction?

Reply
- Jason Brownlee April 24, 2021 at 5:18 am #
  
  Perhaps try it and see?
  
  Reply
Mitraj May 1, 2021 at 10:05 pm #

how conv 3*3 works or what is its algorithm used in synthesis network of styleGAN?

Reply
- Jason Brownlee May 2, 2021 at 5:32 am #
  
  This tutorial explains how convolutional layers work:
  https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/
  
  Reply
Mr. X. May 12, 2022 at 5:46 pm #

Hej Jason,
I am trying to build a GAN that is transferring a style from a piece of art to a fashion item like a t-shirt in a picture. What type of GAN would you recommend for it?

Reply
deepa May 20, 2023 at 3:06 am #

sir can you please give brief explanation about how to make custom dataset so we can train on that please give it help to my project work

Reply
- James Carmichael May 20, 2023 at 6:06 am #
  
  Hi deepa…The following resource may be of interest:
  
  https://machinelearningmastery.com/a-guide-to-getting-datasets-for-machine-learning-in-python/
  
  Reply

Navigation

A Gentle Introduction to StyleGAN the Style Generative Adversarial Network

Overview

Lacking Control Over Synthesized Images

Want to Develop GANs from Scratch?

Control Style Using New Generator Model

What Is the StyleGAN Model Architecture

1. Baseline Progressive GAN

2. Bilinear Sampling

3. Mapping Network and AdaIN

4. Removal of Latent Point Input

5. Addition of Noise

6. Mixing regularization

Examples of StyleGAN Generated Images

High-Quality Faces

Varying Style by Level of Detail

Use of Noise to Control Level of Detail

Further Reading

Summary

Develop Generative Adversarial Networks Today!

Develop Your GAN Models in Minutes

Finally Bring GAN Models to your Vision Projects

More On This Topic

47 Responses to A Gentle Introduction to StyleGAN the Style Generative Adversarial Network

Leave a Reply Click here to cancel reply.