Minimal algorithmic information loss methods for dimension reduction, feature selection and network sparsification

We present a novel, domain-agnostic, model-independent, unsupervised, and universally applicable Machine Learning approach for dimensionality reduction based on the principles of algorithmic complexity. Specifically, but without loss of generality, we focus on addressing the challenge of reducing ce...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:INFORMATION SCIENCES Jg. 720; S. 122520
Hauptverfasser: Zenil, Hector, Kiani, Narsis A., Adams, Alyssa, Abrahão, Felipe S., Rueda-Toicen, Antonio, Zea, Allan A., Ozelim, Luan, Tegnér, Jesper
Format: Journal Article Verlag
Sprache:Englisch
Veröffentlicht: Elsevier Inc 01.12.2025
Schlagworte:
ISSN:0020-0255
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present a novel, domain-agnostic, model-independent, unsupervised, and universally applicable Machine Learning approach for dimensionality reduction based on the principles of algorithmic complexity. Specifically, but without loss of generality, we focus on addressing the challenge of reducing certain dimensionality aspects, such as the number of edges in a network, while retaining essential features of interest. These features include preserving crucial network properties like degree distribution, clustering coefficient, edge betweenness, and degree and eigenvector centralities but can also go beyond edges to nodes and weights for network pruning and trimming. Our approach outperforms classical statistical Machine Learning techniques and state-of-the-art dimensionality reduction algorithms by preserving a greater number of data features that statistical algorithms would miss, particularly nonlinear patterns stemming from deterministic recursive processes that may look statistically random but are not. Moreover, previous approaches heavily rely on a priori feature selection, which requires constant supervision. Our findings demonstrate the effectiveness of the algorithms in overcoming some of these limitations while maintaining a time-efficient computational profile. Our approach not only matches, but also exceeds, the performance of established and state-of-the-art dimensionality reduction algorithms. We extend the applicability of our method to lossy compression tasks involving images and any multi-dimensional data. This highlights the versatility and broad utility of the approach in multiple domains. •Unsupervised, model-free dimensionality reduction using complexity theory.•Preserves key data properties better than traditional techniques.•Captures nonlinear patterns missed by statistical ML methods.•No a priori feature selection or supervision required.•Next generation cognitive neuro-symbolic ML for multi-modal data.
ISSN:0020-0255
DOI:10.1016/j.ins.2025.122520