GPT-Neo

tags: Transformers, GPT, NLP
software: <&gpt-neo>

Architecture

This model is very similar to GPT-2, with the addition of local attention every other layer and a window size of 256 tokens.

Parameter count

1.5B, 2.7B (XL)

Bibliography

Last changed 26/07/2022 | authored by Hugo Cisneros

Comments

← Back to Notes