- tags
- Transformers, GPT, NLP
- paper
- (Zhang et al. 2022)
Architecture
It is the same architecture as GPT-3 but with some training improvements from Megatron.
Parameter count
175B
Bibliography
- Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, et al.. . "OPT: Open Pre-trained Transformer Language Models". arXiv. http://arxiv.org/abs/2205.01068.