Chinchilla

tags
Transformers, GPT, NLP
paper
(Hoffmann et al. 2022)

Architecture

This model is very similar to Gopher, with some improvements to make the model smaller and more efficient.

Parameter count

70B

Bibliography

  1. . . "Training Compute-optimal Large Language Models". arXiv. DOI.

Links to this note

Comments


← Back to Notes