GPT-2

tags
Transformers, GPT
paper
(Radford et al. 2019)

Architecture

Some minor changes from GPT, like a larger context and some order change of normalization.

Parameter count

1.5B

Bibliography

  1. . . "Language Models Are Unsupervised Multitask Learners". Openai Blog 1 (8):9.

Links to this note

Last changed | authored by

Comments


← Back to Notes