OPT: Open Pre-trained Transformer

tags
Transformers, GPT, NLP
paper
(Zhang et al. 2022)

Architecture

It is the same architecture as GPT-3 but with some training improvements from Megatron.

Parameter count

175B

Bibliography

  1. . . "OPT: Open Pre-trained Transformer Language Models". arXiv. http://arxiv.org/abs/2205.01068.

Links to this note

Last changed | authored by

Comments


← Back to Notes