A Lightweight Architecture for Query-by-Example Keyword Spotting on Low-Power IoT Devices

Keyword spotting (KWS) is a task to recognize a keyword or a particular command in a continuous audio stream, which can be effectively applied to a voice trigger system that automatically monitors and processes speech signals. This paper focuses on the problem of user-defined keyword spotting in low...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on consumer electronics Jg. 69; H. 1; S. 65 - 75
1. Verfasser: Li, Meirong
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.02.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:0098-3063, 1558-4127
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Keyword spotting (KWS) is a task to recognize a keyword or a particular command in a continuous audio stream, which can be effectively applied to a voice trigger system that automatically monitors and processes speech signals. This paper focuses on the problem of user-defined keyword spotting in low-resource settings. A lightweight neural network architecture is developed for tackling the keyword detection task using query-by-example (QbyE) techniques. The architecture uses a convolutional recurrent neural network (CRNN) to extract the frame-level features of input audio signals. A customized model compression method is proposed to compress the network, making it suitable for low power settings. In the keyword enrollment, all enrolled keyword examples are merged to generate a single keyword template, which is responsible for detecting a target keyword in keyword search. To improve the efficiency of keyword searching, a segmental local normalized DTW algorithm is introduced. Experiments on the real-world collected datasets show that our approach consistently outperforms the state-of-the-art methods, and the proposed system can run on an ARM Cortex-A7 processor and achieve real-time keyword detection.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0098-3063
1558-4127
DOI:10.1109/TCE.2022.3213075