- tags
- Transformers, GPT, BERT, T5
- paper
- (Shoeybi et al. 2020)
Architecture
The principle of Megatron is to extend existing architectures by using model parallelism. It has a number of parameters that depends on the base model used.
Bibliography
- Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro. . "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". March 13, 2020DOI.
Loading comments...