One model for the learning of language by Yang, Y., & Piantadosi, S. T. (2022)

(Yang, Piantadosi 2022)
NLP, Artificial Intelligence, Machine learning


This paper introduces a model for learning language from few examples while generalizing effectively.

This model builds sentences with function that are combinations of elementary functions, including:

  • pair(L, C) : Concatenates character C onto list L
  • first(L) : Return the first character of L
  • flip(P) : Return true with probability P
  • if(B, X, Y) : Return X if B else return Y (X and Y may be lists, sets, or probabilities)
  • etc.

(See Table 1 in (Yang, Piantadosi 2022) for the complete list)

The space of possible functions constructed by composing the elementary ones above is denoted \(H\). The method wants to compute a posterior distribution over hypotheses \(P(H |D)\) given an example string set \(D\). This is estimated using Bayes rule: \[ P(H|D) \propto P(D | H) P(H) \]

\(P(H)\) is given by a probabilistic context-free grammar a priori and the other term is estimated by maximizing likelihood.


  • The authors propose a special definition of that likelihood \(P(D|H)\) such that it can be used incrementally on parts of a string only. Therefore if \(H\) generates strings similar but not equal to \(D\) it still benefits from it.
  • Hypotheses can be constructed from multiple other sub-hypotheses.
  • For non-deterministic expressions, multiple execution path are considered simultaneously.


It is hard to tell how scalable the method is. Also relatively few details are given about the inference and how it is performed.

I think it is very cool to see “constructive” models of language that can “compute” the language instead of simply predicting the next token statistically. It also feels like this has a much better potential to be a data efficient language model.


  1. . . "One Model for the Learning of Language". Proceedings of the National Academy of Sciences 119 (5):e2021865119. DOI.


← Back to Notes