Text-dependent speaker verification using discrete wavelet transform based on linear prediction coding

•Text-dependent speaker verification means the process of verifying the claimed identity of a person based on the pre-defined utterances for the system.•This can be done using the standard procedure or a novel method called one-for-all classification.•In this type, the difference between the feature...

Full description

Saved in:
Bibliographic Details
Published in:Biomedical signal processing and control Vol. 86; p. 105218
Main Authors: Ketabi, Sina, Rashidi, Saeid, Fallah, Ali
Format: Journal Article
Language:English
Published: Elsevier Ltd 01.09.2023
Subjects:
ISSN:1746-8094, 1746-8108
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Text-dependent speaker verification means the process of verifying the claimed identity of a person based on the pre-defined utterances for the system.•This can be done using the standard procedure or a novel method called one-for-all classification.•In this type, the difference between the features and reference values will be used and only one binary classification will be performed for all speakers of the dataset.•Significant improvements in system performance and an increase in classification duration are the results of using this method. Presenting a system for verifying the identity of people so that the data could be recorded in the simplest possible way, performing identity verification with acceptable accuracy, and being robust to attacks and noises is one of some important issues in today's era. This paper presents a text-dependent speaker verification system in which features are extracted from the data using a method called discrete wavelet transform based on linear prediction coding. In addition to the conventional classification in these systems, a new method for classification in which one binary classification will perform for all speakers of the dataset can also be done, which will be introduced in this research. Then the performance of the designed system on ASVspoof 2015, ASVspoof 2017, ASVspoof 2019, AudioMNIST, and TIMIT datasets will be evaluated using accuracy, precision, sensitivity, specificity, F1-score, and equal error rate (EER) metrics. It will be shown that the EER for the case where classification is done once for each speaker is 0.14±0.83% for the ASVspoof 2017 dataset and the accuracy and EER for the case where classification is done once for all speakers is 100±0.00% and 0.00±0.00% for all datasets.
ISSN:1746-8094
1746-8108
DOI:10.1016/j.bspc.2023.105218