- tags
- Transformers, ViT, Computer vision
- paper
- (Liu et al. 2021)
Architecture
This model extends ViT by replace the multi-head self-attention with a “shifted windows” module allowing ViT to work with higher resolution images.
Parameter count
29M - 197M
Bibliography
- Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. . "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows". arXiv. DOI.