Transformers

tags: Neural networks
resources: Transformer catalog, The illustrated transformer

Transformers are a neural network architecture based on a mechanism called Attention.

They have been particularly successful for NLP applications which started around the publication of a very influential paper by Vaswani and colleagues (Vaswani et al. 2017). Transformers turned out to be very effective language models.

They also penetrated other fields of machine learning such as Computer vision or Reinforcement learning.

Transformers in Pytorch

Pytorch has several implementations of transformers, and the simplest to use is torch.nn.Transformers. For example, their documentation proposes the following example:

transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)
src = torch.rand((10, 32, 512))
tgt = torch.rand((20, 32, 512))
out = transformer_model(src, tgt)h

The only issue is, this implementation abstracts away a lot of what is going on inside a Transformer. This is great for rapidly firing up a large and complex model, but not so great for understanding how it works.

Bibliography

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. December 5, 2017. "Attention Is All You Need". Arxiv:1706.03762 [cs]. http://arxiv.org/abs/1706.03762.

Links to this note

Transformers

Transformers in Pytorch

Bibliography

Links to this note

Comments