- tags
- Transformers, GPT
- paper
- (Radford et al. 2019)
Architecture
Some minor changes from GPT, like a larger context and some order change of normalization.
Parameter count
1.5B
Bibliography
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. . "Language Models Are Unsupervised Multitask Learners". Openai Blog 1 (8):9.