The complexity dynamics of grokking

We demonstrate the existence of a complexity phase transition in neural networks by studying the grokking phenomenon, where networks suddenly transition from memorization to generalization long after overfitting their training data. To characterize this phase transition, we introduce a theoretical f...

Full description

Saved in:
Bibliographic Details
Published in:Physica. D Vol. 482; p. 134859
Main Authors: DeMoss, Branton, Sapora, Silvia, Foerster, Jakob, Hawes, Nick, Posner, Ingmar
Format: Journal Article
Language:English
Published: Elsevier B.V 01.11.2025
Subjects:
ISSN:0167-2789
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We demonstrate the existence of a complexity phase transition in neural networks by studying the grokking phenomenon, where networks suddenly transition from memorization to generalization long after overfitting their training data. To characterize this phase transition, we introduce a theoretical framework for measuring complexity based on rate–distortion theory and Kolmogorov complexity, which can be understood as principled lossy compression for networks. We find that properly regularized networks exhibit a sharp phase transition: complexity rises during memorization, then falls as the network discovers a simpler underlying pattern that generalizes. In contrast, unregularized networks remain trapped in a high-complexity memorization phase. We establish an explicit connection between our complexity measure and generalization bounds, providing a theoretical foundation for the link between lossy compression and generalization. Our framework achieves compression ratios 30-40× better than naïve approaches, enabling precise tracking of complexity dynamics. Finally, we introduce a regularization method based on spectral entropy that encourages networks toward low-complexity representations by penalizing their intrinsic dimension.
ISSN:0167-2789
DOI:10.1016/j.physd.2025.134859