A Lightweight Architecture for Query-by-Example Keyword Spotting on Low-Power IoT Devices

Keyword spotting (KWS) is a task to recognize a keyword or a particular command in a continuous audio stream, which can be effectively applied to a voice trigger system that automatically monitors and processes speech signals. This paper focuses on the problem of user-defined keyword spotting in low...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on consumer electronics Vol. 69; no. 1; pp. 65 - 75
Main Author: Li, Meirong
Format: Journal Article
Language:English
Published: New York IEEE 01.02.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:0098-3063, 1558-4127
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Keyword spotting (KWS) is a task to recognize a keyword or a particular command in a continuous audio stream, which can be effectively applied to a voice trigger system that automatically monitors and processes speech signals. This paper focuses on the problem of user-defined keyword spotting in low-resource settings. A lightweight neural network architecture is developed for tackling the keyword detection task using query-by-example (QbyE) techniques. The architecture uses a convolutional recurrent neural network (CRNN) to extract the frame-level features of input audio signals. A customized model compression method is proposed to compress the network, making it suitable for low power settings. In the keyword enrollment, all enrolled keyword examples are merged to generate a single keyword template, which is responsible for detecting a target keyword in keyword search. To improve the efficiency of keyword searching, a segmental local normalized DTW algorithm is introduced. Experiments on the real-world collected datasets show that our approach consistently outperforms the state-of-the-art methods, and the proposed system can run on an ARM Cortex-A7 processor and achieve real-time keyword detection.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0098-3063
1558-4127
DOI:10.1109/TCE.2022.3213075