This is done by outputting an extra halting probability at each update step, and considering two timelines:
- the input timeline which plays the role of an outer loop, at each of those step, a new input symbol is fed to the RNN. This step outputs a single output vector.
- the internal processing timeline, this is the inner loop being run at each of the input steps. This runs until the cumulative halting probability is above a threshold and emits as many output values as steps..
- Alex Graves. . "Adaptive Computation Time for Recurrent Neural Networks". Arxiv:1603.08983 [cs]. http://arxiv.org/abs/1603.08983.