- tags
- Transformers, GPT, NLP
- paper
- (Hoffmann et al. 2022)
Architecture
This model is very similar to Gopher, with some improvements to make the model smaller and more efficient.
Parameter count
70B
Bibliography
- Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, et al.. . "Training Compute-optimal Large Language Models". arXiv. DOI.