MHS-VIT: Mamba hybrid self-attention vision transformers for traffic image detection

With the rapid development of intelligent transportation systems, especially in traffic image detection tasks, the introduction of the transformer architecture greatly promotes the improvement of model performance. However, traditional transformer models have high computational costs during training...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one Jg. 20; H. 6; S. e0325962
Hauptverfasser:	Zhang, Xude, Ou, Weihua, Wu, Xiaoping, Zhang, Changzhen
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	United States Public Library of Science 30.06.2025 Public Library of Science (PLoS)
Schlagworte:	Accuracy Algorithms Attention Biology and Life Sciences Complexity Computer and Information Sciences Computer applications Computer vision Computing costs Deep learning Engineering and Technology Humans Image detection Image processing Image Processing, Computer-Assisted - methods Intelligent transportation systems Localization Machine vision Methods Models, Theoretical Neural networks Object recognition Physical Sciences Research and Analysis Methods Social Sciences Spatial dependencies State space models Technology application Time series Traffic Traffic engineering Traffic signs Vision Visual tasks China
ISSN:	1932-6203, 1932-6203
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With the rapid development of intelligent transportation systems, especially in traffic image detection tasks, the introduction of the transformer architecture greatly promotes the improvement of model performance. However, traditional transformer models have high computational costs during training and deployment due to the quadratic complexity of their self-attention mechanism, which limits their application in resource-constrained environments. To overcome this limitation, this paper proposes a novel hybrid architecture, Mamba Hybrid Self-Attention Vision Transformers (MHS-VIT), which combines the advantages of Mamba state-space model (SSM) and transformer to improve the modeling efficiency and performance of visual tasks and to enhance the modeling efficiency and accuracy of the model in processing traffic images. Mamba, as a linear time complexity SSM, can effectively reduce the computational burden without sacrificing performance. The self-attention mechanism of the transformer is good at capturing long-distance spatial dependencies in images, which is crucial for understanding complex traffic scenes. Experimental results showed that MHS-VIT exhibited excellent performances in traffic image detection tasks. Whether it is vehicle detection, pedestrian detection, or traffic sign recognition tasks, this model could accurately and quickly identify target objects. Compared with backbone networks of the same scale, MHS-VIT achieved significant improvements in accuracy and model parameter quantity.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0325962