- tags
- Transformers, Computer vision, BERT
- paper
- (Dosovitskiy et al. 2021)
Architecture
It is an extension of the BERT architecture that can be trained on patches of images.
Parameter count
86M to 632M
Bibliography
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al.. . "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale". arXiv. DOI.