- (Lewis et al. 2019)
It is an encoder/decoder architecture. The encoder is based on BERT and the decoder is based on GPT. It generalizes the two models into a single one.
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. . "BART: Denoising Sequence-to-sequence Pre-training for Natural Language Generation, Translation, and Comprehension". arXiv. DOI.