Vision transformer

tags: Transformers, Computer vision, BERT
paper: (Dosovitskiy et al. 2021)

Architecture

It is an extension of the BERT architecture that can be trained on patches of images.

Parameter count

86M to 632M

Bibliography

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al.. June 3, 2021. "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale". arXiv. DOI.

Links to this note

Last changed 27/07/2022 | authored by Hugo Cisneros

Comments

← Back to Notes