Running Stable Diffusion with Python

Stable Diffusion is a deep learning model that can generate pictures. In essence, it is a program in which you can provide input (such as a text prompt) and get back a tensor that represents an array of pixels, which, in turn, you can save as an image file. There’s no requirement that you must use a particular user interface. Before any user interface is available, you are supposed to run Stable Diffusion in code.

In this tutorial, we will see how you can use the diffusers library from Hugging Face to run Stable Diffusion.

After finishing this tutorial, you will learn

  • How to install the diffusers library and its dependencies
  • How to create a pipeline in diffusers
  • How to fine tune your image generation process

Let’s get started.

Running Stable Diffusion in Python
Photo by Himanshu Choudhary. Some rights reserved.

Overview

This tutorial is in three parts; they are

  • Introduction to the Diffusers Library
  • Customizing the Stable Diffusion Pipeline
  • Other Modules in the Diffusers Library

Introduction to the Diffusers Library

Stable Diffusion has taken the text-to-image generation world by storm. Its ability to produce high-quality, detailed images from textual descriptions makes it a powerful tool for artists, designers, and anyone with a creative spark. With the Stable Diffusion model file, you can rebuild the deep learning model using PyTorch, but you will need to write a lot of code to use it because there are many steps involved. The Hugging Face Diffusers library can harness Stable Diffusion’s potential and let you craft your own dreamlike creations.

Before you use it, you should install the diffusers library in your Python environment:

These Python packages have a lot of dependencies, including PyTorch.

In this post, you will use the pipeline function in the diffuers library. It is called a pipeline because not a single deep learning model allows you to generate pictures from your input, but many smaller models work in tandem to achieve that. Let’s look at an example:

These are the few lines of code to generate a picture, and save it in PNG format to cat.png. This is an example of the generated picture:

A picture generated with Stable Diffusion pipeline.

However, a lot of work is being done on the backend. You passed on a text prompt. This prompt has been converted into a numerical tensor using a pretrained embedding model. The tensor is then passed on to the Stable Diffusion model, downloaded from the Hugging Face repository “CompVis/stable-diffusion-v1-4” (the official Stable Diffusion v1.4 model). This model will be run with 30 steps and the DDPM scheduler. The output from the Stable Diffusion model will be a floating point tensor, which has to be converted into pixel values before you can save it. All these are accomplished by chaining the components with a pipeline into the object pipe.

Customizing the Stable Diffusion Pipeline

In the previous code, you download a pretrained model from the Hugging Face repository. Even for the same repository, different “variants” of the same model are available. Mostly, the default variant uses a 32-bit floating point, which is suitable for running on both CPU and GPU. The variant you used in the code above is fp16, which is to use 16-bit floating point. It is not always available and not always named as such. You should check the corresponding repository to learn more details.

Because the variant used is for 16-bit floating point, you specified the torch_dtype to use torch.float16 as well. Note that most CPUs cannot work with 16-bit floating points (also known as half-precision floats), but it works for GPUs. Hence, you saw that the pipeline created was passed on to the GPU using the statement pipe.to("cuda").

You can try the following modification, which you should be able to observe a much slower generation because it is run on CPU:

However, suppose you have been using the Stable Diffusion Web UI and downloaded the third-party model for Stable Diffusion. In that case, you should be familiar with model files saved in SafeTensors format. This is in a different format than the above Hugging Face repository. Most notably, the repository would include a config.json file to describe how to use the model, but such information should be inferred from a SafeTensor model file instead.

You can still use the model files you downloaded. For example, with the following code:

This code uses StableDiffusionPipeline.from_single_file() instead of StableDiffusionPipeline.from_pretrained(). The argument to this function is presumed to be the path to the model file. It will figure out that the file is in SafeTensors format. It is the neatness of the diffusers library that nothing else needs to be changed after you swapped how to create the pipeline.

Note that each Pipeline assumes a certain architecture. For example, there is StableDiffusionXLPipeline from diffusers library solely for Stable Diffusion XL. You cannot use the model file with the wrong pipeline builder.

You can see that the most important parameters of the Stable Diffusion image generation process are described in the pipe() function call when you triggered the process. For example, you can specify the scheduler, step size, and CFG scale. The scheduler indeed has another set of configuration parameters. You can choose among the many schedulers supported by the diffuers library, which you can find in the details in the diffusers API manual.

For example, the following is to use a faster alternative, the Euler Scheduler, and keep everything else the same:

Other Modules in the Diffusers Library

The StableDiffusionPipeline is not the only pipeline in the diffusers library. As mentioned above, you have StableDiffusionXLPipeline for the XL models, but there are much more. For example, if you are not just providing a text prompt but invoking the Stable Diffusion model with img2img, you have to use StableDiffusionImg2ImgPipeline. You can provide an image of the PIL object as an argument to the pipeline. You can check out the available pipelines from the diffusers documentation:

Even with the many different pipeline, you should find all of them work similarly. The workflow is highly similar to the example code above. You should find it easy to use without any need to understand the detailed mechanism behind the scene.

Further Reading

This section provides more resources on the topic if you want to go deeper.

Summary

In this post, you discovered how to use the diffusers library from Hugging Face. In particular, you learned:

  • How to create a pipeline to create an image from a prompt
  • How you can reuse your local model file instead of dynamically download from repository online
  • What other pipeline models are available from the diffusers library

No comments yet.

Leave a Reply