# GPT-Neo

<&gpt-neo>

## Architecture

This model is very similar to GPT-2, with the addition of local attention every other layer and a window size of 256 tokens.

1.5B, 2.7B (XL)