Transformers are a neural network architecture based on a mechanism called Attention.
They have been particularly successful for NLP applications which started around the publication of a very influential paper by Vaswani and colleagues (Vaswani et al. 2017). Transformers turned out to be very effective language models.
Transformers in Pytorch
Pytorch has several implementations of transformers, and the simplest to use is
torch.nn.Transformers. For example, their documentation proposes the following example:
transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12) src = torch.rand((10, 32, 512)) tgt = torch.rand((20, 32, 512)) out = transformer_model(src, tgt)h
The only issue is, this implementation abstracts away a lot of what is going on inside a Transformer. This is great for rapidly firing up a large and complex model, but not so great for understanding how it works.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. . "Attention Is All You Need". Arxiv:1706.03762 [cs]. http://arxiv.org/abs/1706.03762.