# DALL-E-2

tags
Transformers, Diffusion models, CLIP
paper
(Ramesh et al. 2022)

## Architecture

This is the successor of DALL-E, it is an encoder/decoder model that uses a combination of CLIP and Diffusion models to generate images from text. The diffusion decoder is similar to GLIDE.

3.5B

## Bibliography

1. . . "Hierarchical Text-conditional Image Generation with CLIP Latents". arXiv. DOI.
Last changed | authored by