Humans have a remarkable ability to systematically generalize—reasoning about new situations by combining aspects of previous experiences. Language provides one of the primary examples of this ability and modern machine learning has drawn much inspiration from linguistics. A recent example is iterated learning, a procedure where generations of networks learn from the output of earlier learners. The result is a refinement of the network’s “language” or output labels for given inputs toward compositional structure. Here we theoretically study the emergence of compositional language, and the ability of simple neural networks to leverage this compositionality to systematically generalize. We build on prior theoretical work on linear networks, which mathematically define systematic generalization, by a) applying the analysis of shallow and deep linear network to the iterated learning procedure by deriving exact dynamics of learning over generations; b) refining the definition of systematicity to understand the benefits and limitations of iterated learning. We find that iterated learning does facilitate systematic generalization over standard training paradigms by uncovering compositional substructure in the output labels. Our results confirm a long standing conjecture: that multiple generations of iterated learning are required for compositional structure to emerge, which can outperform a single generation network trained with optimal early-stopping. However, for the network to treat the input systematically and ignore features which do not generalize, the network must be trained on an extremely large dataset. Hence, we define “weak systematic generalization” to explain this emergent systematicity from scale.
Building similarity graph...
Analyzing shared references across papers
Loading...
Devon Jarvis
Richard Klein
Benjamin Rosman
Proceedings of the National Academy of Sciences
University College London
University of the Witwatersrand
Sainsbury Laboratory
Building similarity graph...
Analyzing shared references across papers
Loading...
Jarvis et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69fbefd5164b5133a91a3ecf — DOI: https://doi.org/10.1073/pnas.2509739123