An intelligent speech enhancement model using enhanced heuristic-based residual convolutional neural network with encoder-decoder architecture

As the listening capacity exists in humans, they are facing the critical issues of understanding the speeches even in the presence of some background or other noises in the world. To diminish the noises, Speech Enhancement (SE) is the process of improving the quality of the speech signal by applying...

Full description

Saved in:
Bibliographic Details
Published in:International journal of speech technology Vol. 27; no. 3; pp. 637 - 656
Main Authors: Balasubrahmanyam, M., Valarmathi, R. S.
Format: Journal Article
Language:English
Published: New York Springer US 01.09.2024
Springer Nature B.V
Subjects:
ISSN:1381-2416, 1572-8110
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As the listening capacity exists in humans, they are facing the critical issues of understanding the speeches even in the presence of some background or other noises in the world. To diminish the noises, Speech Enhancement (SE) is the process of improving the quality of the speech signal by applying some techniques without degrading any such information. Hence, it is used for hearing-aid people, speech recognition, etc. Recent researchers have developed some works to increase speech intelligibility. Substantial success has been achieved by executing the supervised learning methods. Nevertheless, the existing process incurs such shortcomings as attaining maximum error, computation burden, and so on. To overcome that, certain deep learning methodologies are immensely involved in SE for determining the spectrogram magnitude when reconstructing the signal source by removing the noise. Thus, it also becomes a more challenging task to acquire a clean speech signal. Though these methods aim to present the speech as more clear and intelligible, it may arise such intricacies that degrade the quality and efficiency. As it contains beneficial structure and resources, still it is in the scope of developing the novel SE model. To conquer these complexities, a successful SE task is offered utilizing deep learning in this paper. This recommended work performs the SE which employs deep learning to denoise a noisy speech to generate a quality speech. At first, the speech signal that contains noises such as cooler noise or fan noise, restaurant noise, railway station noise, factory noise, traffic in the journey, bus-stand noise, cinema theater noise, and clouding areas noise are gathered from the standard online sources. After that, this noisy speech signal is forwarded to Adaptive Residual Convolutional Neural Networks with Encoder-Decoder (ARCNNetED) architecture, where the parameters involved in this framework are optimized with the support of Random Revised Drawer Algorithm (RRDA). Thus, the noise presented in the input speech signal is completely removed by the suggested ARCNNetED and the quality of the speech also is enhanced. Finally, the performance of the suggested speech enhancement approach is evaluated over various traditional tasks with the support of several metrics. The findings of the developed model show better performance in terms of MAE, RMSE, and PSNR showing the value of 0.03, 0.19, and 62.26. This analysis significantly handles the error rate to offer accurate outcomes in the speech recognition framework.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1381-2416
1572-8110
DOI:10.1007/s10772-024-10127-3