# GPT-Neo

tags
Transformers, GPT, NLP
software
<&gpt-neo>

## Architecture

This model is very similar to GPT-2, with the addition of local attention every other layer and a window size of 256 tokens.

1.5B, 2.7B (XL)