GPT-3

tags: Transformers, NLP, GPT
paper: (Brown et al. 2020)

Architecture

Like GPT-2, with the addition of locally banded sparse attention.

Parameter count

175B

Bibliography

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al.. June 4, 2020. "Language Models Are Few-shot Learners". Arxiv:2005.14165 [cs]. http://arxiv.org/abs/2005.14165.

Links to this note

Last changed 26/07/2022 | authored by Hugo Cisneros

Comments

← Back to Notes