Stable Diffusion WebUI AUTOMATIC1111: Text-to-image Guide

Independence Sale! Up to 59% OFF – Among the Best Prices This Year!



Blog

Partner

About Us

Hot GPU Discounts

Introduction

Stable Diffusion is a text-to-image model. It is primarily used to generate detailed images based on text descriptions. Stable Diffusion is an excellent alternative to tools like midjourney and DALLE-2. And the great thing about this tool is that you can run it locally on your computer or use services like Dream Studio or Hugging Face.

In this guide, I will show how to use individual settings of Stable Diffusion to achieve excellent results in image generation. And I will use Stable Diffusion web UI AUTOMATIC1111.

Prompt

The first thing that we need to do is to write the right prompt. Prompt is a text string that we submit to the system so that it can create an image for you.

Prompt string in AUTOMATIC1111 user interface.

Generally, the more specific details you provide, the better results the system will generate for you. But finding the right prompt can be challenging. To make things easier, you can use resources like Lexica to find a relevant prompt.

Lexica is a collection of images with prompts.

So once you find a relevant image, you can click on it to see the prompt.

Prompt string along with the model and seed number.

Copy the prompt, paste it to the Stable Diffusion and press Generate to see generated images.

Images generated by Stable Diffusion based on the prompt we’ve provided.

However, as you can see, the tool didn’t generate an exact copy of the original image. Instead, you see a few variations of the image. And it’s how Stable Diffusion works. If you want a close copy of the original image found at Lexica, you need to specify the Seed (you can read about the Seed below).

Model

Models, sometimes called checkpoint files, are pre-trained Stable Diffusion weights intended for generating general or a particular genre of images. What images a model can generate depends on the data used to train them. The results you will get using a prompt might different for different models of Stable Diffusion.

Sampling Steps

Sampling steps is the number of iterations that Stable Diffusion runs to go from random noise to a recognizable image based on the text prompt. As an extremely general rule of thumb, the higher the sampling steps, the more detail you will add to your image at the cost of longer processing time. However, what the optimal number of sampling steps for you and the output image you’re trying to generate is dependent on many variables.

Sampling steps in Stable Diffusion

Quality improves as the sampling step increases. Typically 20 steps with Euler sampler is enough to reach a high quality, sharp image. Although the image will still change subtly when stepping through to higher values, it will become different but not necessarily higher quality. Recommendation: 20 steps. Adjust to higher if you suspect quality is low.

Sampling methods

There’s a variety of sampling methods you can choose, depending on what GUI you are using. They are simply different methods for solving diffusion equations. They are supposed to give the same result but could be slightly different due numerical bias. But since there’s no right answer here – the only criteria is the image looks good, accuracy of the method should not be your concern.

Not all methods are created equal. Below are the processing time of various methods.

Seed

The random seed determines the initialize noise pattern and hence the final image. Setting it to -1 means using a random one every time. It is useful when you want to generate new images. On the other hand, fixing it would result in the same images in each new generation.

How to find the seed used for an image if you use random seed? In the dialog box, you should see something like:

Steps: 30, Sampler: Euler a, CFG scale: 15, Seed: 1310942811, Size: 512x512, Model hash: e3edb8a26f, Model: ghostmix_v20Bakedvae, Version: v1.3.2

Recommendation: Set to -1 to explore. Fix to a value for fine-tuning.

Width and Height

Width and Height define the size of the generated image. By default, Stable Diffusion generates images in sizes 512 to 512 pixels. You will get the most consistent result when you use this size. You can change the size, but it will require more computational power.

The size of output image. Since Stable Diffusion is trained with 512×512 images, setting it to portrait or landscape sizes can create unexpected issues. Leave it as square whenever possible.

Recommendation: Set image size as 512×512.

CFG Scale

Classifier Free Guidance scale is a parameter to control how much the model should respect your prompt.

1 – Mostly ignore your prompt.

3 – Be more creative.

7 – A good balance between following the prompt and freedom.

15 – Adhere more to prompt.

30 – Strictly follow the prompt.

Recommendation: Starts with 7. Increase if you want it to follow your prompt more.

This setting says how close the Stable Diffusion will listen to your prompt. Let’s use the same prompt but play with the CFG Scale. Lowering the scale to one means that AI will consider only some parts of the prompt.

Result of generation using the CFG Scale equal to 1.

When we up the scale to the max, AI will strictly follow the prompt. Usually, it’s better not to make the scale too high and to choose a value like 15.

Result of generation using the CFG Scale equal to 18.

Batch size

Batch size is the number of images generated each time. Since the final images are very dependent on the random seed, it is always a good idea to generate a few images at a time. This way, you can get a good sense of what the current prompt can do.

Recommendation: Set batch size to 4 or 8.

Restore faces

A dirty little secret of Stable Diffusion is that it often has issues with faces and eyes. Restore faces is a post-processing method applied to images using AI trained specifically to correct faces. To turn it on, check the box next to Restore faces. Go to Settings tab, under Face restoration model, select CodeFormer.

Recommendation: Turn restore faces on when you generate images with faces.

Tiling

Use the Tiling option to produce a periodic image that can be tiled. Below is an example.

Hires. fix

When we click Hires.fix (High Resolution Fix, high-resolution repair), when drawing, AI will use the original image as a draft, and first use the AI enlargement algorithm (Upscale) to enlarge the image to our specified After multiples or length and width, the entire enlarged image is redrawn in the form of image generation (img2img), and finally the high-definition large image we want is generated.

Upscaler: AI amplifier, the same as the amplifier provided by Extras' amplification function.

Hires steps: How many redraw steps to run after zooming in. 0 will not redraw at all. I usually use 20 to 30 steps.

Denoising strength: Noise strength. Specifies how much noise should be added to a graph when starting to generate a graph. 0 means no noise at all, which means no redrawing at all. 1 means that the entire graph is completely replaced by random noise, resulting in a completely uncorrelated graph. Usually, when it is 0.5, it will cause a significant change in color, light and shadow, and when it is 0.75, even the structure and the posture of the characters will have obvious changes.

Upscale by: The degree of magnification, 2 means that the length and width are doubled.

Summary

In this article, we have covered the basic parameters for Stable Diffusion AI. Now you know how to control your images with CFG, sampling steps, sampling methods, seed, image width and height, batch size, etc. Check out this article for step-by-step guide in building high-quality prompt. Check out this article for more advanced prompting techniques.