Novel metrics and LSH algorithms for unsupervised, real-time anomaly detection in multi-aspect data streams

Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect d...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Engineering science and technology, an international journal Ročník 69; s. 102119
Hlavní autoři: Khodabandehlou, Samira, Hashemi Golpayegani, Alireza
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.09.2025
Elsevier
Témata:
ISSN:2215-0986, 2215-0986
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Given a vast online stream of transactions in e-markets, how can we detect fraudulent traders and suspicious behaviors in an unsupervised manner? Can we detect them in constant time and memory? Fraud detection in e-markets is increasingly challenging due to the scale and complexity of multi-aspect data streams. This study introduces SATrade, an unsupervised and scalable approach for real-time anomaly detection in big multi-aspect data streams. This approach proposes two novel Locality-Sensitive Hashing (LSH) functions: Gaussian projections to preserve numerical distances and collision-resistant linear hashing to prevent the increase in dimensionality of the categorical data. The main contributions include the Collusiveness metric, which detects group anomalies through statistical divergence analysis, and the RR-ISF, which prioritizes rare burst patterns. An exponential decay mechanism (λ) ensures adaptability to evolving fraud tactics without retraining, while PCA handles feature correlation. In extensive experiments on five real datasets, using both synthetic and real labels, SATrade achieved 99 % AUC, 93 % F-measure, and 0.2 ms/record latency, which is a significant improvement over the six baseline methods. The framework’s interpretability allows tracing anomalies to fraudulent behaviors like sudden order spikes. The constant memory consumption of 0.25 MB per record and linear scalability make SATrade suitable for high-frequency environments and online platforms.
ISSN:2215-0986
2215-0986
DOI:10.1016/j.jestch.2025.102119