Stable Diffusion Pipeline

Course Notes 2022 (fastai)

StableDiffusionPipeline is an end-to-end diffusion inference pipeline that allows you to start generating images with just a few lines of code. Many Hugging Face libraries (along with other libraries such as scikit-learn) use the concept of a "pipeline" to indicate a sequence of steps that when combined complete some task.

Here a short fastai introduction to stable diffusion pipelines → video. It demonstrates how random noise is slowly transformed into a defined structure.

Link to sources

Classifier-Free Guidance is a method to increase the adherence of the output to the conditioning signal we used (the text). Roughly speaking, the larger the guidance the more the model tries to represent the text prompt. However, large values tend to produce less diversity. The default is 7.5, which represents a good compromise between variety and fidelity. This blog post goes into deeper details on how it works.

→ video

Link to sources

Negative prompting refers to the use of another prompt (instead of a completely unconditioned generation), and scaling the difference between generations of that prompt and the conditioned generation.

→ video

Link to sources

Image to image: Even though Stable Diffusion was trained to generate images, and optionally drive the generation using text conditioning, we can use the raw image diffusion process for other tasks. For example, instead of starting from pure noise, we can start from an image an add a certain amount of noise to it. We are replacing the initial steps of the denoising and pretending our image is what the algorithm came up with. Then we continue the diffusion process from that state as usual. This usually preserves the composition although details may change a lot. It's great for sketches!

→ video

Link to sources