Vision transformer

tags
Transformers, Computer vision, BERT
paper
(Dosovitskiy et al. 2021)

Architecture

It is an extension of the BERT architecture that can be trained on patches of images.

Parameter count

86M to 632M

Bibliography

  1. . . "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale". arXiv. DOI.

Comments


← Back to Notes