- tags
- GPT, Transformers, NLP
- paper
- (Zhang et al. 2020)
Architecture
It is exactly like a GPT-2 architecture but trained on dialog data.
Parameter count
1.5B
Bibliography
- Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. . "Dialogpt: Large-scale Generative Pre-training for Conversational Response Generation". arXiv. DOI.