- tags
- Transformers, GPT, NLP
- blog post
- BLOOM announcement blog post
Architecture
It is similar to the architecture of GPT-3, using full attention instead of sparse attention.
Parameter count
176B
It is similar to the architecture of GPT-3, using full attention instead of sparse attention.
176B