VAE
A VAE (Variational Autoencoder) is the component that converts images between full-resolution pixels and the compressed latent space a diffusion model works in - encoding on the way in and decoding on the way out.
Modern image models do not work directly on pixels - that would be far too slow. Instead they operate in a compact latent space. The VAE is the translator that makes this possible: its encoder compresses an image into latents, and its decoder turns finished latents back into a full-resolution picture.
How it fits in
- Text-to-image: the diffusion model denoises in latent space, then the VAE decoder renders the final pixels.
- Image-to-image: the VAE encoder turns your input photo into latents so the model can transform it.
- Inpainting: the masked region is processed in latents and decoded back through the VAE.
Why it matters
The VAE affects color accuracy and fine detail. A poor or mismatched VAE can produce washed-out colors, slightly blurry output or odd artifacts; the right one gives crisp, vivid results. Most checkpoints ship with a baked-in VAE, so you rarely have to think about it - but it is the reason a "latent" model can ever show you actual pixels.
Try it in the generator
Put vae to work right now - free daily generations, commercial license included.
Related terms
- Latent spaceLatent space is the compressed, abstract representation a diffusion model works in. Instead of manipulating millions of pixels, the model generates in this smaller space and then decodes it into an image.
- Diffusion modelA diffusion model is the type of AI that powers most modern image generators. It learns to turn random noise into a coherent image by reversing a step-by-step noising process.
- CheckpointA checkpoint is a saved AI model file containing the full set of trained weights. It is the complete "brain" that generates images - swapping checkpoints changes the entire look and capability.
- DenoisingDenoising is the core operation of a diffusion model: at each step it predicts and removes a little noise, gradually turning a random field into a clear image.