Swin Transformer

tags
Transformers, ViT, Computer vision
paper
(Liu et al. 2021)

Architecture

This model extends ViT by replace the multi-head self-attention with a “shifted windows” module allowing ViT to work with higher resolution images.

Parameter count

29M - 197M

Bibliography

  1. . . "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows". arXiv. DOI.

Comments


← Back to Notes