Fine-Tuning Stable Diffusion with LoRA

Stable Diffusion can generate an image based on your input. There are many models that are similar in architecture and pipeline, but their output can be quite different. There are many ways to adjust their behavior, such as when you give a prompt, the output will be in a certain style by default. LoRA is one technique that does not require you to recreate a large model. In this post, you will see how you can create a LoRA on your own.

After finishing this post, you will learn

  • How to prepare and train a LoRA model
  • How to use the trained LoRA in Python

Let’s get started.

Fine-tuning Stable Diffusion with LoRA
Photo by Thimo Pedersen. Some rights reserved.

Overview

This post is in three parts; they are

  • Preparation for Training a LoRA
  • Training a LoRA with Diffusers Library
  • Using Your Trained LoRA

Preparation for Training a LoRA

We covered the idea of using LoRA in the Web UI in a previous post. If you want to create your own LoRA, a plugin in the Web UI allows you to do that, or you can create one using your own program. Since all training will be computationally intensive, be sure you have a machine with GPU to continue.

We will use the training script from the example directory of the diffusers library. Before you start, you have to set up the environment by installing the required Python libraries, using the following commands:

The first command is to install the diffusers library from GitHub, which will be the development version. This is required because you will use the training script from GitHub, hence you should use the matching version.

The last command above confirmed you have installed the accelerate library and detect what GPU you have on your computer. You have downloaded and installed many libraries. You can try to run the Python statements below to confirm that all are installed correctly and that you have no import error:

You will use the LoRA training script from the examples of diffusers. Let’s download the script first:

Training a LoRA with Diffusers Library

For fine-tuning, you will be using the Pokémon BLIP captions with English and Chinese dataset on the base model runwayml/stable-diffusion-v1-5 (the official Stable Diffusion v1.5 model). You can adjust hyperparameters to suit your specific use case, but you can start with the following Linux shell commands.

Running this command will take hours to complete, even with a high-end GPU. But let’s look closer at what this does.

The accelerate command helps you to launch the training across multiple GPUs. It does no harm if you have just one. Many modern GPUs support the “Brain Float 16” floating point introduced by the Google Brain project. If it is supported, the option --mixed_precision="bf16" will save memory and run faster.

The command script downloads the dataset from the Hugging Face Hub and uses it to train a LoRA model. The batch size, training steps, learning rate, and so on are the hyperparameters for the training. The trained model will be checkpointed once every 500 steps to the output directory.

Training a LoRA requires a dataset with images (pixels) and corresponding captions (text). The caption text describes the image, and the trained LoRA will understand that these captions should mean those images. If you check out the dataset on Hugging Face Hub, you will see the caption name was en_text, and that is set to --caption_column above.

If you are providing your own dataset instead (e.g., manually create captions for the images you gathered), you should create a CSV file metadata.csv with first column named file_name and second column to be your text captions, like the following:

and keep this CSV together with all your images (matching the file_name column) in the same directory, and use the directory name as your dataset name.

There will be many subdirectories and files created under the directory as assigned to OUTPUT_DIR in the script above. Each checkpoint will contain the full Stable Diffusion model weight, and extracted LoRA safetensors. Once you finish the training, you can delete all of them except the final LoRA file, pytorch_lora_weights.safetensors.

Using Your Trained LoRA

Running a Stable Diffusion pipeline with LoRA just require a small modification to your Python code. An example would be the following:

The code above downloaded a LoRA from the Hugging Face Hub repository pcuenq/pokemon-lora and attach it to the pipeline using the line pipe.unet.load_attn_procs(model_path). The rest is just as usual. The image generated may look like the following:

Green pokemon as generated

This is the more verbose way of using the LoRA since you have to know that this particular LoRA should be loaded to the attention process of the pipeline’s unet part. Such details should be found in the model card in the repository.

An easier way of using the LoRA would be to use the auto pipeline, from which the model architecture is inferred from the model file:

The parameters to load_lora_weights() is the directory name and the file name to your trained LoRA file. This works for other LoRA files, such as those you downloaded from Civitai.

Further Reading

This section provides more resources on the topic if you want to go deeper.

Summary

In this post, you saw how to create your own LoRA model, given a set of images and the description text. This is a time-consuming process, but the result is that you have a small weight file that can modify the behavior of the diffusion model. You learned how to run the training of LoRA using diffusers library. You also saw how to use a LoRA weight in your Stable Diffusion pipeline code.

No comments yet.

Leave a Reply