The authors use simple text classification tasks to try and understand if these learned properties can be understood by looking at the state dynamics of RNNs.
The RNNs usually behave like attractor networks, with the hidden state lying on a low-dimensional manifold.
A first task is to classify sentences based on the number of evidence word corresponding to a target class. A simple solution to this problem is a counter which returns a class with the majority of evidence words.
With 3 classes, the learned neural network functions exactly like an integrator working mostly on a 2D equilateral triangle. Each evidence word moves the hidden state towards a corner of this triangle while neutral words don’t move the hidden state.
For varying number of classes \(N\), the authors show that \(N-1\) dimensions are mostly used for classification, explaining 95% of the variance of the hidden state.
Interestingly, learned attractors are more or less similar with natural classification data. A RNN learns for each word a direction that will lead the hidden state towards the corresponding class.
With the more involved task of ordered classification (star review prediction), RNN still learn low dimensional attractors. The integration is now apparently twofold: sentiment and intensity both play a role for the final score.
With multi-label classification, a RNN keeps track of all classes combinations like if they were different classes.
I’m particularly interested in this kind of work trying to understand how these neural networks work. Gradient descent seems pretty good at finding shortcuts in data. This makes it particularly efficient for relatively simple tasks like sentence classification or relatively OK language modeling, but fails to construct more complex primitives or attractors.
Neuroscience seems to have shown that at least parts of our brain functions use attractor dynamics like RNNs, but they likely weren’t found through the same kind of optimization.
It is interesting to think about this in connection with (Katharopoulos et al. 2020). This also mean that the powerful transformers also act like some kind of fancy integrator in a large space. It seems like this would be limiting their capabilities, since our brain doesn’t look like its only doing integration.
Aitken, Kyle, Vinay V. Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, and Niru Maheswaranathan. 2020. “The Geometry of Integration in Text Classification RNNs.” arXiv:2010.15114 [Cs, Stat], October.