- tags
- Transformers, GPT, NLP
- software
- <&gpt-neo>
Architecture
This model is very similar to GPT-2, with the addition of local attention every other layer and a window size of 256 tokens.
Parameter count
1.5B, 2.7B (XL)
This model is very similar to GPT-2, with the addition of local attention every other layer and a window size of 256 tokens.
1.5B, 2.7B (XL)