Language modeling


LM with RNNs.

Different models have been studied, starting from the initial Recurrent neural network based language model (Mikolov et al. 2011).

LSTM were then used with more success than previous models (Zaremba, Sutskever, and Vinyals 2015).

LM with Transformers

Language modeling and Compression

Text generation

Language models can be used to generate text from a prompt or starting sentence. This is the kind of examples that made models like GPT-2 and GPT-3 famous, because of their ability to generate long sequences of apparently coherent text (Radford et al. 2019; Brown et al. 2020).


Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” arXiv:2005.14165 [Cs], June.

Mikolov, Tomas, Martin Karafiat, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. 2011. “Recurrent Neural Network Based Language Model.” In Interspeech 2011, 4.

Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners.” OpenAI Blog 1 (8):9.

Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. 2015. “Recurrent Neural Network Regularization.” arXiv:1409.2329 [Cs], February.

← Back to Notes