Big bird

tags
Transformers, NLP
paper
(Zaheer et al. 2021)

Architecture

Big bird can be used as both an encoder-only and an encoder/decoder architecture.

It extends the likes of BERT by implementing a sparse attention mechanism, making the attention computational complexity less than quadratic.

Bibliography

  1. . . "Big Bird: Transformers for Longer Sequences". arXiv. DOI.

Comments


← Back to Notes