Text-dependent speaker verification using discrete wavelet transform based on linear prediction coding
•Text-dependent speaker verification means the process of verifying the claimed identity of a person based on the pre-defined utterances for the system.•This can be done using the standard procedure or a novel method called one-for-all classification.•In this type, the difference between the feature...
Saved in:
| Published in: | Biomedical signal processing and control Vol. 86; p. 105218 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.09.2023
|
| Subjects: | |
| ISSN: | 1746-8094, 1746-8108 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | •Text-dependent speaker verification means the process of verifying the claimed identity of a person based on the pre-defined utterances for the system.•This can be done using the standard procedure or a novel method called one-for-all classification.•In this type, the difference between the features and reference values will be used and only one binary classification will be performed for all speakers of the dataset.•Significant improvements in system performance and an increase in classification duration are the results of using this method.
Presenting a system for verifying the identity of people so that the data could be recorded in the simplest possible way, performing identity verification with acceptable accuracy, and being robust to attacks and noises is one of some important issues in today's era. This paper presents a text-dependent speaker verification system in which features are extracted from the data using a method called discrete wavelet transform based on linear prediction coding. In addition to the conventional classification in these systems, a new method for classification in which one binary classification will perform for all speakers of the dataset can also be done, which will be introduced in this research. Then the performance of the designed system on ASVspoof 2015, ASVspoof 2017, ASVspoof 2019, AudioMNIST, and TIMIT datasets will be evaluated using accuracy, precision, sensitivity, specificity, F1-score, and equal error rate (EER) metrics. It will be shown that the EER for the case where classification is done once for each speaker is 0.14±0.83% for the ASVspoof 2017 dataset and the accuracy and EER for the case where classification is done once for all speakers is 100±0.00% and 0.00±0.00% for all datasets. |
|---|---|
| ISSN: | 1746-8094 1746-8108 |
| DOI: | 10.1016/j.bspc.2023.105218 |