- tags
- Transformers, NLP, GPT
- paper
- (Brown et al. 2020)
Architecture
Like GPT-2, with the addition of locally banded sparse attention.
Parameter count
175B
Bibliography
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al.. . "Language Models Are Few-shot Learners". Arxiv:2005.14165 [cs]. http://arxiv.org/abs/2005.14165.