Gopher

tags: Transformers, GPT
paper: (Rae et al. 2022)

Architecture

This model is very similar to GPT-2 but uses RSNorm instead of LayerNorm and relative positional encoding rather than absolute positional encoding.

Parameter count

280B

Bibliography

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, et al.. January 21, 2022. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". arXiv. DOI.

Links to this note

Chinchilla

Last changed 22/02/2023 | authored by Hugo Cisneros

Comments

← Back to Notes