Recursive Transformer

Recursive Transformers are Transformer architecture modification that repeatedly apply the same set of shared layers is applied repeatedly in a recursive fashion. A form of Weight Sharing allow for models that have shallow networks with comparable results to deeper networks with distinct layers.

First described in Universal Transformers paper.