MQTTEEB-D: A high-fidelity benchmark for real-time MQTT anomaly detection using machine learning techniques

Message Queuing Telemetry Transport (MQTT) is essential for resource-constrained Internet of Things (IoT) environments; however, its widespread adoption has introduced significant security vulnerabilities. Although machine learning (ML) offers a promising solution for anomaly detection, existing mod...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Ad hoc networks Ročník 181; s. 104062
Hlavní autoři: Allaga, Hamza, Biniz, Mohamed, Farchane, Abderrazak
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.02.2026
Témata:
ISSN:1570-8705
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Message Queuing Telemetry Transport (MQTT) is essential for resource-constrained Internet of Things (IoT) environments; however, its widespread adoption has introduced significant security vulnerabilities. Although machine learning (ML) offers a promising solution for anomaly detection, existing models are often hindered by unrealistic data, severe class imbalances, and high computational costs. To address these limitations, we present a comprehensive ML framework for MQTT anomaly detection benchmarked on MQTTEEB-D, a high-fidelity dataset from a physical IoT testbed. Our framework evaluates a diverse suite of algorithms, including tree ensembles and boosting methods, on both original imbalanced and balanced data. We assessed performance using standard metrics, imbalance-stable metrics such as the Matthews Correlation Coefficient (MCC), and a Performance–Efficiency Score (PES) to quantify the trade-off between predictive power and computational cost. Our results establish a new state-of-the-art, with the top models achieving over 98.8% accuracy and F1-score. These models also yielded dramatic efficiency gains, including a 43-fold reduction in training time and a 299-fold speedup in inference latency over previous benchmarks. Critically, we found that a model’s resilience to class imbalance is more vital for real-world deployment than its peak performance on artificially balanced data. Simpler tree-based models remained robust under imbalanced conditions, where more complex algorithms failed. These findings provide a new benchmark and reorient model selection towards efficient, reliable, and deployable IoT security systems.
ISSN:1570-8705
DOI:10.1016/j.adhoc.2025.104062