GPT-Neo

tags
Transformers, GPT, NLP
software
<&gpt-neo>

Architecture

This model is very similar to GPT-2, with the addition of local attention every other layer and a window size of 256 tokens.

Parameter count

1.5B, 2.7B (XL)

Bibliography

    Comments


    ← Back to Notes