A theory of initialisation's impact on specialisation*

Uloženo v:
Podrobná bibliografie
Název: A theory of initialisation's impact on specialisation*
Autoři: Jarvis, Devon, Lee, Sebastian, Carla Juliette Domine, Clementine, Saxe, Andrew M., Sarao Mannelli, Stefano, 1992
Zdroj: JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT. 2025(11)
Témata: analysis of algorithms, online dynamics, deep learning, machine learning
Popis: Prior work has demonstrated a consistent tendency in neural networks engaged in continual learning tasks, wherein intermediate task similarity results in the highest levels of catastrophic interference. This phenomenon is attributed to the network's tendency to reuse learned features across tasks. However, this explanation heavily relies on the premise that neuron specialisation occurs, i.e. the emergence of localised representations. Our investigation challenges the validity of this assumption. Using theoretical frameworks for the analysis of neural networks, we show a strong dependence of specialisation on the initial condition. More precisely, we show that weight imbalance and high weight entropy can favour specialised solutions. We then apply these insights in the context of continual learning, first showing the emergence of a monotonic relation between task-similarity and forgetting in non-specialised networks. Finally, we show that specialisation by weight imbalance is beneficial on the commonly employed elastic weight consolidation regularisation technique.
Popis souboru: electronic
Přístupová URL adresa: https://research.chalmers.se/publication/549475
https://research.chalmers.se/publication/549475/file/549475_Fulltext.pdf
Databáze: SwePub
Popis
Abstrakt:Prior work has demonstrated a consistent tendency in neural networks engaged in continual learning tasks, wherein intermediate task similarity results in the highest levels of catastrophic interference. This phenomenon is attributed to the network's tendency to reuse learned features across tasks. However, this explanation heavily relies on the premise that neuron specialisation occurs, i.e. the emergence of localised representations. Our investigation challenges the validity of this assumption. Using theoretical frameworks for the analysis of neural networks, we show a strong dependence of specialisation on the initial condition. More precisely, we show that weight imbalance and high weight entropy can favour specialised solutions. We then apply these insights in the context of continual learning, first showing the emergence of a monotonic relation between task-similarity and forgetting in non-specialised networks. Finally, we show that specialisation by weight imbalance is beneficial on the commonly employed elastic weight consolidation regularisation technique.
ISSN:17425468
DOI:10.1088/1742-5468/ae1214