Gopher

tags
Transformers, GPT
paper
(Rae et al. 2022)

Architecture

This model is very similar to GPT-2 but uses RSNorm instead of LayerNorm and relative positional encoding rather than absolute positional encoding.

Parameter count

280B

Bibliography

  1. . . "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". arXiv. DOI.

Links to this note

Comments


← Back to Notes