Swin Transformer

tags: Transformers, ViT, Computer vision
paper: (Liu et al. 2021)

Architecture

This model extends ViT by replace the multi-head self-attention with a “shifted windows” module allowing ViT to work with higher resolution images.

Parameter count

29M - 197M

Bibliography

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. August 17, 2021. "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows". arXiv. DOI.

Last changed 27/07/2022 | authored by Hugo Cisneros

Comments

← Back to Notes