DALL-E-2

tags
Transformers, Diffusion models, CLIP
paper
(Ramesh et al. 2022)

Architecture

This is the successor of DALL-E, it is an encoder/decoder model that uses a combination of CLIP and Diffusion models to generate images from text. The diffusion decoder is similar to GLIDE.

Parameter count

3.5B

Bibliography

  1. . . "Hierarchical Text-conditional Image Generation with CLIP Latents". arXiv. DOI.

Comments


← Back to Notes