DALL-E-2

tags: Transformers, Diffusion models, CLIP
paper: (Ramesh et al. 2022)

Architecture

This is the successor of DALL-E, it is an encoder/decoder model that uses a combination of CLIP and Diffusion models to generate images from text. The diffusion decoder is similar to GLIDE.

Parameter count

3.5B

Bibliography

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. April 12, 2022. "Hierarchical Text-conditional Image Generation with CLIP Latents". arXiv. DOI.

Architecture

Parameter count

Bibliography

Comments

Leave a comment