Megatron

tags
Transformers, GPT, BERT, T5
paper
(Shoeybi et al. 2020)

Architecture

The principle of Megatron is to extend existing architectures by using model parallelism. It has a number of parameters that depends on the base model used.

Bibliography

  1. . . "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". March 13, 2020DOI.
Last changed | authored by

Comments

Loading comments...

Leave a comment

Back to Notes