Language modeling


LM with RNNs

Different models have been studied, starting from the initial Recurrent neural network based language model (Mikolov et al. 2011).

LSTM were then used with more success than previous models (Zaremba, Sutskever, and Vinyals 2015).

Recently, transformers seem to have dominated language modeling. However it is not clear if this is due to their real superiority over RNNs or their practical scalability (Merity 2019).

LM with Transformers

Language modeling and Compression

Text generation

Language models can be used to generate text from a prompt or starting sentence. This is the kind of examples that made models like GPT-2 and GPT-3 famous, because of their ability to generate long sequences of apparently coherent text (Radford et al. 2019; Brown et al. 2020).

Other applications

Language modeling for Automated theorem proving

(Polu and Sutskever 2020)

Language modeling for Reinforcement Learning

(Janner, Li, and Levine, n.d.)


  1. . . "Language Models Are Few-Shot Learners". arXiv:2005.14165 [Cs], June.

  2. . n.d. “Reinforcement Learning as One Big Sequence Modeling Problem”, 15.

  3. . . “Recurrent Neural Network Based Language Model”. In Interspeech 2011, 4.

  4. . . “Generative Language Modeling for Automated Theorem Proving”. arXiv:2009.03393 [Cs, Stat], September.

  5. . . “Language Models Are Unsupervised Multitask Learners”. OpenAI Blog 1 (8):9.

  6. . . “Recurrent Neural Network Regularization”. arXiv:1409.2329 [Cs], February.

  7. . . “Single Headed Attention RNN: Stop Thinking with Your Head”. arXiv:1911.11423 [Cs], November.

← Back to Notes