A Hybrid Framework for Real-Time Data Drift and Anomaly Identification Using Hierarchical Temporal Memory and Statistical Tests

Uloženo v:
Podrobná bibliografie
Název: A Hybrid Framework for Real-Time Data Drift and Anomaly Identification Using Hierarchical Temporal Memory and Statistical Tests
Autoři: Subhadip Bandyopadhyay, Joy Bose, Sujoy Roy Chowdhury
Zdroj: International Journal of Mathematical, Engineering and Management Sciences, Vol 10, Iss 3, Pp 777-796 (2025)
Publication Status: Preprint
Informace o vydavateli: Ram Arti Publishers, 2025.
Rok vydání: 2025
Témata: telecom network monitoring, FOS: Computer and information sciences, Technology, Computer Science - Machine Learning, real-time anomaly detection, I.2.6, I.2.7, G.3, 62M10, 62P30, 68T07, data drift detection, H.2.8, H.3.3, Machine Learning (cs.LG), hybrid machine learning models, ai powered data drift detection, QA1-939, hierarchical temporal memory (htm), time series, Mathematics, sequential probability ratio test (sprt), streaming data analysis
Popis: Data Drift refers to the phenomenon where the generating model behind the data changes over time. Due to data drift, any model built on the past training data becomes less relevant and inaccurate over time. Thus, detecting and controlling for data drift is critical in machine learning models. Hierarchical Temporal Memory (HTM) is a machine learning model developed by Jeff Hawkins, inspired by how the human brain processes information. It is a biologically inspired model of memory similar in structure to the neocortex and whose performance is claimed to be comparable to state of the art models in detecting anomalies in time series data. Another unique benefit of HTMs is their independence from training and testing cycles; all the learning takes place online with streaming data, and no separate training and testing cycle is required. In the sequential learning paradigm, the Sequential Probability Ratio Test (SPRT) offers unique benefits for online learning and inference. This paper proposes a novel hybrid framework combining HTM and SPRT for real-time data drift detection and anomaly identification. Unlike existing data drift methods, our approach eliminates frequent retraining and ensures low false positive rates. HTMs currently work with one dimensional or univariate data. In a second study, we also propose an application of HTM in a multidimensional supervised scenario for anomaly detection by combining the outputs of multiple HTM columns, one for each data dimension, through a neural network. Experimental evaluations demonstrate that the proposed method outperforms conventional drift detection techniques like the Kolmogorov-Smirnov (KS) test, Wasserstein distance, and Population Stability Index (PSI) in terms of accuracy, adaptability, and computational efficiency. Our experiments also provide insights into optimizing hyperparameters for real-time deployment in domains such as Telecom.
Druh dokumentu: Article
Jazyk: English
ISSN: 2455-7749
DOI: 10.33889/ijmems.2025.10.3.039
DOI: 10.48550/arxiv.2504.18599
Přístupová URL adresa: http://arxiv.org/abs/2504.18599
https://doaj.org/article/64d306aa2fe347eb8fa0d5c9f0c2af63
https://doi.org/10.33889/IJMEMS.2025.10.3.039
Rights: CC BY
Přístupové číslo: edsair.doi.dedup.....b08e1fdfe1537e8ce3c16077f5521873
Databáze: OpenAIRE
Popis
Abstrakt:Data Drift refers to the phenomenon where the generating model behind the data changes over time. Due to data drift, any model built on the past training data becomes less relevant and inaccurate over time. Thus, detecting and controlling for data drift is critical in machine learning models. Hierarchical Temporal Memory (HTM) is a machine learning model developed by Jeff Hawkins, inspired by how the human brain processes information. It is a biologically inspired model of memory similar in structure to the neocortex and whose performance is claimed to be comparable to state of the art models in detecting anomalies in time series data. Another unique benefit of HTMs is their independence from training and testing cycles; all the learning takes place online with streaming data, and no separate training and testing cycle is required. In the sequential learning paradigm, the Sequential Probability Ratio Test (SPRT) offers unique benefits for online learning and inference. This paper proposes a novel hybrid framework combining HTM and SPRT for real-time data drift detection and anomaly identification. Unlike existing data drift methods, our approach eliminates frequent retraining and ensures low false positive rates. HTMs currently work with one dimensional or univariate data. In a second study, we also propose an application of HTM in a multidimensional supervised scenario for anomaly detection by combining the outputs of multiple HTM columns, one for each data dimension, through a neural network. Experimental evaluations demonstrate that the proposed method outperforms conventional drift detection techniques like the Kolmogorov-Smirnov (KS) test, Wasserstein distance, and Population Stability Index (PSI) in terms of accuracy, adaptability, and computational efficiency. Our experiments also provide insights into optimizing hyperparameters for real-time deployment in domains such as Telecom.
ISSN:24557749
DOI:10.33889/ijmems.2025.10.3.039