How does Stable Diffusion work Flashcards
Source of Info
https://stable-diffusion-art.com/how-stable-diffusion-work/
Diffusion Model
Stable Diffusion belongs to a class of —- ——- models called ——- models.
They are ——– models, meaning they are designed to ——– new data similar to what they have —- in ———-.
In the case of Stable Diffusion, the data are ——.
Diffusion Model
Stable Diffusion belongs to a class of deep learning models called diffusion models.
They are generative models, meaning they are designed to generate new data similar to what they have seen in training.
In the case of Stable Diffusion, the data are images.
hy is it called the diffusion model?
Because its math looks very much like diffusion in physics.
What is forward diffusion?
A forward diffusion process adds noise to a training image, gradually turning it into an uncharacteristic noise image.
The forward process will turn any cat or dog image into a noise image.
Eventually, you won’t be able to tell whether they are initially a dog or a cat.
What is Reverse Diffusion
Starting from a noisy, meaningless image, reverse diffusion recovers a cat OR a dog image.
Technically, every diffusion process has two parts: (1) drift and (2) random motion.
The reverse diffusion drifts towards either cat OR dog images but nothing in between. That’s why the result can either be a cat or a dog.
How is training done
- The answer is teaching a neural network model to predict the noise added.
- It is called the noise predictor in Stable Diffusion. It is a U-Net model.
Explain the 4 steps in stable diffusion training.
- Pick a training image, like a photo of a cat.
- Generate a random noise image.
- Corrupt the training image by adding this noisy image up to a certain number of steps.
- Teach the noise predictor to tell us how much noise was added. This is done by tuning its weights and showing it the correct answer.
After training, we have a noise predictor capable of estimating the noise added to an image.
How does reverse diffusion work
- We first generate a completely random image and ask the noise predictor to tell us the noise.
- We then subtract this estimated noise from the original image.
- Repeat this process a few times. You will get an image of either a cat or a dog.
This image is unconditioned.
How large is the Stable Diffusion model?
The file size of Stable Diffusion v1.5 model is 4.27GB, but it depends on what kind of models you install on your computer.
The basic setup (including Python, Git for Windows, and WebUI) takes roughly 9.37GB of HDD/SDD space.
https://okuha.com/vram-requirements-for-stable-diffusion/
What system requirements do you need to run SD? (2)
- roughly 10GB of HDD/SDD space and roughly 8GB of GPU VRAM.
- You must also reserve some space for installing Git and Python, as Stable Diffusion requires those to work properly.
How much VRAM should i get?
16GB enables you to run Stable Diffusion faster and also allows you to produce larger images and more intricate results with it.
It can still work with less but slower and smaller files
https://okuha.com/vram-requirements-for-stable-diffusion/
Can you run SD without a graphics card? (2)
- No, Stable Diffusion requires a graphics card to run.
- This is because the software relies on the GPU to calculate and render digital AI art.