Megatron

tags
Transformers, GPT, BERT, T5
paper
(Shoeybi et al. 2020)

Architecture

The principle of Megatron is to extend existing architectures by using model parallelism. It has a number of parameters that depends on the base model used.

Bibliography

1. . . "Megatron-lm: Training Multi-billion Parameter Language Models Using Model Parallelism". arXiv. DOI.