A Secure and Robust Multimodal Framework for In-Vehicle Voice Control: Integrating Bilingual Wake-Up, Speaker Verification, and Fuzzy Command Understanding

Intelligent in-vehicle voice systems face critical challenges in robustness, security, and semantic flexibility under complex acoustic conditions. To address these issues holistically, this paper proposes a novel multimodal and secure voice-control framework. The system integrates a hybrid dual-chan...

Full description

Saved in:

Bibliographic Details
Published in:	Eng (Basel, Switzerland) Vol. 6; no. 11; p. 319
Main Authors:	Zhang, Zhixiong, Li, Yao, Ren, Wen, Wang, Xiaoyan
Format:	Journal Article
Language:	English
Published:	Basel MDPI AG 01.11.2025
Subjects:	Accuracy Algorithms Architecture Background noise Design dual-channel wake-up ECAPA-TDNN English language Fault tolerance fuzzy command matching GPIO control In vehicle in-vehicle voice control Intelligent vehicles Matching Multilingualism Robustness (mathematics) Semantics speaker verification Voice control
ISSN:	2673-4117, 2673-4117
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Intelligent in-vehicle voice systems face critical challenges in robustness, security, and semantic flexibility under complex acoustic conditions. To address these issues holistically, this paper proposes a novel multimodal and secure voice-control framework. The system integrates a hybrid dual-channel wake-up mechanism, combining a commercial English engine (Picovoice) with a custom lightweight ResNet-Lite model for Chinese, to achieve robust cross-lingual activation. For reliable identity authentication, an optimized ECAPA-TDNN model is introduced, enhanced with spectral augmentation, sliding window feature fusion, and an adaptive threshold mechanism. Furthermore, a two-tier fuzzy command matching algorithm operating at character and pinyin levels is designed to significantly improve tolerance to speech variations and ASR errors. Comprehensive experiments on a test set encompassing various Chinese dialects, English accents, and noise environments demonstrate that the proposed system achieves high performance across all components: the wake-up mechanism maintains commercial-grade reliability for English and provides a functional baseline for Chinese; the improved ECAPA-TDNN attains low equal error rates of 2.37% (quiet), 5.59% (background music), and 3.12% (high-speed noise), outperforming standard baselines and showing strong noise robustness against the state of the art; and the fuzzy matcher boosts command recognition accuracy to over 95.67% in quiet environments and above 92.7% under noise, substantially outperforming hard matching by approximately 30%. End-to-end tests confirm an overall interaction success rate of 93.7%. This work offers a practical, integrated solution for developing secure, robust, and flexible voice interfaces in intelligent vehicles.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2673-4117 2673-4117
DOI:	10.3390/eng6110319