Neural networks
Transformer catalog, The illustrated transformer

Transformers are a neural network architecture based on a mechanism called Attention.

They have been particularly successful for NLP applications which started around the publication of a very influential paper by Vaswani and colleagues (Vaswani et al. 2017). Transformers turned out to be very effective language models.

They also penetrated other fields of machine learning such as Computer vision or Reinforcement learning.

Transformers in Pytorch

Pytorch has several implementations of transformers, and the simplest to use is torch.nn.Transformers. For example, their documentation proposes the following example:

transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)
src = torch.rand((10, 32, 512))
tgt = torch.rand((20, 32, 512))
out = transformer_model(src, tgt)h

The only issue is, this implementation abstracts away a lot of what is going on inside a Transformer. This is great for rapidly firing up a large and complex model, but not so great for understanding how it works.


  1. . . "Attention Is All You Need". Arxiv:1706.03762 [cs].
Last changed | authored by


← Back to Notes