- tags
- Transformers, NLP
- paper
- (Zaheer et al. 2021)
Architecture
Big bird can be used as both an encoder-only and an encoder/decoder architecture.
It extends the likes of BERT by implementing a sparse attention mechanism, making the attention computational complexity less than quadratic.
Bibliography
- Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, et al.. . "Big Bird: Transformers for Longer Sequences". arXiv. DOI.