GPT-3

tags
Transformers, NLP, GPT
paper
(Brown et al. 2020)

Architecture

Like GPT-2, with the addition of locally banded sparse attention.

Parameter count

175B

Bibliography

  1. . . "Language Models Are Few-shot Learners". Arxiv:2005.14165 [cs]. http://arxiv.org/abs/2005.14165.

Comments


← Back to Notes