Audio based depression detection using Convolutional Autoencoder

•A novel audio-based depression detection system using Convolutional Autoencoder.•Convolutional Autoencoder for extracting highly correlated and compact feature set.•Thorough experimental study based on a real-world depression detection dataset.•Complete comparison of proposed feature extraction met...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications Jg. 189; S. 116076
Hauptverfasser: Sardari, Sara, Nakisa, Bahareh, Rastgoo, Mohammed Naim, Eklund, Peter
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Elsevier Ltd 01.03.2022
Elsevier BV
Schlagworte:
ISSN:0957-4174, 1873-6793
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract •A novel audio-based depression detection system using Convolutional Autoencoder.•Convolutional Autoencoder for extracting highly correlated and compact feature set.•Thorough experimental study based on a real-world depression detection dataset.•Complete comparison of proposed feature extraction method with other techniques. Depression is a serious and common psychological disorder that requires early diagnosis and treatment. In severe episodes the condition may result in suicidal thoughts. Recently, the need for building an effective audio-based Automatic Depression Detection (ADD) system has sparked the interest of the research community. To date, most of the reported approaches to recognize depression rely on hand-crafted feature extraction for audio data representation. They combine wide variety of audio-related features to improve the classification performance. However, combining many hand-crafted features including relevant and less-relevant can enlarge the feature space which can lead to high-dimensionality issues as not all the features would carry significant information regarding depression. Having high number of features can make the pattern recognition more difficult and increase the risk of overfitting. To overcome these limitations, an audio-based framework of depression detection which includes an adaptation of a deep learning (DL) technique is proposed to automatically extract the highly relevant and compact feature set. This proposed framework uses an end-to-end Convolutional Neural Network-based Autoencoder (CNN AE) technique to learn the highly relevant and discriminative features from raw sequential audio data, and hence to detect depressed people more accurately. In addition, to address the sample imbalance problem we use a cluster-based sampling technique which highly reduces the risk of bias towards the major class (non-depressed). To evaluate the performance and effectiveness of the proposed pipeline, we perform the experiments on Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset and compare them with the hand-crafted feature extraction methods and other outstanding studies in this domain. The results show that proposed method outperforms other well-known audio-based ADD models with at least 7% improvement in F-measure for classifying depression.
AbstractList •A novel audio-based depression detection system using Convolutional Autoencoder.•Convolutional Autoencoder for extracting highly correlated and compact feature set.•Thorough experimental study based on a real-world depression detection dataset.•Complete comparison of proposed feature extraction method with other techniques. Depression is a serious and common psychological disorder that requires early diagnosis and treatment. In severe episodes the condition may result in suicidal thoughts. Recently, the need for building an effective audio-based Automatic Depression Detection (ADD) system has sparked the interest of the research community. To date, most of the reported approaches to recognize depression rely on hand-crafted feature extraction for audio data representation. They combine wide variety of audio-related features to improve the classification performance. However, combining many hand-crafted features including relevant and less-relevant can enlarge the feature space which can lead to high-dimensionality issues as not all the features would carry significant information regarding depression. Having high number of features can make the pattern recognition more difficult and increase the risk of overfitting. To overcome these limitations, an audio-based framework of depression detection which includes an adaptation of a deep learning (DL) technique is proposed to automatically extract the highly relevant and compact feature set. This proposed framework uses an end-to-end Convolutional Neural Network-based Autoencoder (CNN AE) technique to learn the highly relevant and discriminative features from raw sequential audio data, and hence to detect depressed people more accurately. In addition, to address the sample imbalance problem we use a cluster-based sampling technique which highly reduces the risk of bias towards the major class (non-depressed). To evaluate the performance and effectiveness of the proposed pipeline, we perform the experiments on Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset and compare them with the hand-crafted feature extraction methods and other outstanding studies in this domain. The results show that proposed method outperforms other well-known audio-based ADD models with at least 7% improvement in F-measure for classifying depression.
Depression is a serious and common psychological disorder that requires early diagnosis and treatment. In severe episodes the condition may result in suicidal thoughts. Recently, the need for building an effective audio-based Automatic Depression Detection (ADD) system has sparked the interest of the research community. To date, most of the reported approaches to recognize depression rely on hand-crafted feature extraction for audio data representation. They combine wide variety of audio-related features to improve the classification performance. However, combining many hand-crafted features including relevant and less-relevant can enlarge the feature space which can lead to high-dimensionality issues as not all the features would carry significant information regarding depression. Having high number of features can make the pattern recognition more difficult and increase the risk of overfitting. To overcome these limitations, an audio-based framework of depression detection which includes an adaptation of a deep learning (DL) technique is proposed to automatically extract the highly relevant and compact feature set. This proposed framework uses an end-to-end Convolutional Neural Network-based Autoencoder (CNN AE) technique to learn the highly relevant and discriminative features from raw sequential audio data, and hence to detect depressed people more accurately. In addition, to address the sample imbalance problem we use a cluster-based sampling technique which highly reduces the risk of bias towards the major class (non-depressed). To evaluate the performance and effectiveness of the proposed pipeline, we perform the experiments on Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset and compare them with the hand-crafted feature extraction methods and other outstanding studies in this domain. The results show that proposed method outperforms other well-known audio-based ADD models with at least 7% improvement in F-measure for classifying depression.
ArticleNumber 116076
Author Rastgoo, Mohammed Naim
Sardari, Sara
Eklund, Peter
Nakisa, Bahareh
Author_xml – sequence: 1
  givenname: Sara
  surname: Sardari
  fullname: Sardari, Sara
  email: sara.sardari@shirazu.ac.ir
  organization: Computer Science, Engineering and IT Department, Shiraz University, Shiraz, Iran
– sequence: 2
  givenname: Bahareh
  surname: Nakisa
  fullname: Nakisa, Bahareh
  email: Bahar.nakisa@deakin.edu.au
  organization: School of Information Technology, Faculty of Science Engineering and Built Environment, Deakin University, Vic, Australia
– sequence: 3
  givenname: Mohammed Naim
  surname: Rastgoo
  fullname: Rastgoo, Mohammed Naim
  email: mohammadnaim.rastgoo@qut.edu.au
  organization: School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia
– sequence: 4
  givenname: Peter
  surname: Eklund
  fullname: Eklund, Peter
  email: peter.eklund@deakin.edu.au
  organization: School of Information Technology, Faculty of Science Engineering and Built Environment, Deakin University, Vic, Australia
BookMark eNp9kEtLxDAQgIOs4O7qH_BU8NyapG3SgAeX4gsWvOg5pMlUUmqzJu2K_96UevKwp3kw3zDzbdBqcAMgdE1wRjBht10G4VtlFFOSEcIwZ2doTSqep4yLfIXWWJQ8LQgvLtAmhA5jwjHma3S_m4x1SaMCmMTAwUMI1g0xHUGPczYFO3wktRuOrp_mjuqT3TQ6GLQz4C_Reav6AFd_cYveHx_e6ud0__r0Uu_2qc5pNaZNKRqTt6UgTVW0Oi9aQTVmlEFRFYqK1hCFlaC0wbFiwFmDGak0a0pT5JTnW3Sz7D149zVBGGXnJh-PCZIywpnAgtA4RZcp7V0IHlp58PZT-R9JsJxNyU7OpuRsSi6mIlT9g7Qd1fzq6JXtT6N3Cwrx9aMFL4O20QwY66M_aZw9hf8CRHGFww
CitedBy_id crossref_primary_10_1016_j_compbiomed_2024_108382
crossref_primary_10_1016_j_specom_2024_103106
crossref_primary_10_3390_ijerph20021588
crossref_primary_10_1038_s41746_025_01933_3
crossref_primary_10_1038_s44184_023_00040_z
crossref_primary_10_1109_TAFFC_2024_3521327
crossref_primary_10_1016_j_compbiomed_2023_106835
crossref_primary_10_1177_20552076241256730
crossref_primary_10_1145_3709367
crossref_primary_10_1016_j_jad_2025_01_136
crossref_primary_10_1109_ACCESS_2024_3362233
crossref_primary_10_2196_60439
crossref_primary_10_1109_TCSVT_2024_3491098
crossref_primary_10_1049_cit2_12113
crossref_primary_10_1016_j_eswa_2024_125025
crossref_primary_10_3390_e25091350
crossref_primary_10_1016_j_bspc_2025_108461
crossref_primary_10_3390_s24123714
crossref_primary_10_1016_j_heliyon_2024_e25959
crossref_primary_10_1038_s41598_024_63556_0
crossref_primary_10_1016_j_bspc_2022_104561
crossref_primary_10_1016_j_compbiomed_2024_109325
crossref_primary_10_1016_j_patrec_2023_07_016
crossref_primary_10_1109_LSP_2025_3567028
crossref_primary_10_1109_ACCESS_2022_3231681
crossref_primary_10_1016_j_compbiomed_2023_107534
crossref_primary_10_1016_j_inffus_2023_102017
crossref_primary_10_1109_TCSS_2023_3343689
crossref_primary_10_1109_TAFFC_2024_3506554
crossref_primary_10_1016_j_bspc_2024_106594
crossref_primary_10_1109_TASLPRO_2025_3533370
crossref_primary_10_3389_fcomp_2025_1629725
crossref_primary_10_1016_j_compeleceng_2024_109413
crossref_primary_10_1016_j_jad_2025_119739
crossref_primary_10_1016_j_neucom_2025_131126
crossref_primary_10_3390_s25164989
crossref_primary_10_1093_jamia_ocae189
crossref_primary_10_1007_s11571_022_09904_0
crossref_primary_10_3390_healthcare10050935
crossref_primary_10_1016_j_bspc_2025_108123
crossref_primary_10_1109_TII_2022_3224968
crossref_primary_10_1016_j_inffus_2024_102861
crossref_primary_10_1371_journal_pone_0291500
crossref_primary_10_1016_j_artmed_2023_102745
crossref_primary_10_1109_JBHI_2024_3404664
crossref_primary_10_1109_TAFFC_2024_3395117
crossref_primary_10_1007_s13755_022_00197_5
crossref_primary_10_1016_j_compbiomed_2022_106122
crossref_primary_10_1016_j_compbiomed_2023_106741
crossref_primary_10_1080_03772063_2024_2434572
crossref_primary_10_1109_TAFFC_2025_3552835
crossref_primary_10_1016_j_eswa_2023_122356
crossref_primary_10_1016_j_eswa_2023_120011
Cites_doi 10.1038/s41598-020-74399-w
10.1109/ACCESS.2019.2951750
10.1109/JBHI.2018.2866873
10.1016/j.eswa.2021.114693
10.21437/Interspeech.2017-1421
10.1016/j.specom.2015.03.004
10.1145/3186585
10.2174/1567205014666171120143800
10.1016/j.eswa.2019.07.010
10.3390/e22060688
10.1016/j.engappai.2018.09.018
10.1016/j.jad.2008.06.026
10.1145/3107990.3108004
10.1145/2661806.2661807
10.1016/j.csl.2018.07.007
10.1098/rsta.2015.0202
10.1109/ACCESS.2020.3027026
10.1007/s41666-019-00061-4
10.25080/Majora-7b98e3ed-003
10.1016/j.aquaeng.2020.102053
10.1109/TASLP.2019.2938863
10.1109/SCIS-ISIS.2018.00023
10.1145/3266302.3266316
10.1145/3133944.3133953
10.1016/j.ecoinf.2020.101084
10.1016/j.ins.2017.05.008
10.1145/2512530.2512533
10.1037/t00742-000
10.1016/j.media.2017.08.005
10.1371/journal.pone.0144610
10.1145/2988257.2988258
10.1186/s13636-020-00182-4
10.1109/JBHI.2019.2938247
10.1016/j.eswa.2017.09.062
10.1109/ACCESS.2018.2833746
10.1109/ACCESS.2018.2868361
10.1371/journal.pmed.0030442
10.1016/j.buildenv.2017.06.048
10.21437/Interspeech.2015-184
10.1109/ACCESS.2020.2970836
ContentType Journal Article
Copyright 2021 Elsevier Ltd
Copyright Elsevier BV Mar 1, 2022
Copyright_xml – notice: 2021 Elsevier Ltd
– notice: Copyright Elsevier BV Mar 1, 2022
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.eswa.2021.116076
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
ExternalDocumentID 10_1016_j_eswa_2021_116076
S0957417421014147
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AATTM
AAXKI
AAXUO
AAYFN
ABBOA
ABFNM
ABJNI
ABMAC
ABMVD
ABUCO
ACDAQ
ACGFS
ACHRH
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEIPS
AEKER
AENEX
AFTJW
AGHFR
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AKRWK
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNPGV
BNSAS
CS3
DU5
EBS
EFJIC
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SPC
SPCBC
SSB
SSD
SSH
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
9DU
AAAKG
AAQXK
AAYWO
AAYXX
ABKBG
ABUFD
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKYEP
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EFLBG
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
R2-
SBC
SET
SEW
WUQ
XPP
ZMT
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c328t-b59bd3f591b84fc34f92c0626e484a29fd1a0a922b0a296e76b0618c6b5d43273
ISICitedReferencesCount 76
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000717676900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0957-4174
IngestDate Sun Nov 09 07:53:26 EST 2025
Tue Nov 18 22:04:52 EST 2025
Sat Nov 29 07:07:43 EST 2025
Sun Apr 06 06:53:03 EDT 2025
IsPeerReviewed true
IsScholarly true
Keywords Audio depression detection
Early depression detection
Semi-supervised learning
Convolutional Autoencoder
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c328t-b59bd3f591b84fc34f92c0626e484a29fd1a0a922b0a296e76b0618c6b5d43273
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2617690912
PQPubID 2045477
ParticipantIDs proquest_journals_2617690912
crossref_primary_10_1016_j_eswa_2021_116076
crossref_citationtrail_10_1016_j_eswa_2021_116076
elsevier_sciencedirect_doi_10_1016_j_eswa_2021_116076
PublicationCentury 2000
PublicationDate 2022-03-01
2022-03-00
20220301
PublicationDateYYYYMMDD 2022-03-01
PublicationDate_xml – month: 03
  year: 2022
  text: 2022-03-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle Expert systems with applications
PublicationYear 2022
Publisher Elsevier Ltd
Elsevier BV
Publisher_xml – name: Elsevier Ltd
– name: Elsevier BV
References Beck, A. T., Steer, R. A., Brown, G. K., 1996. Beck depression inventory.
Giannakopoulos (b0075) 2015; 10
Chernykh, V., & Prikhodko, P. 2017. Emotion recognition from speech with recurrent neural networks.
Nakisa, Rastgoo, Rakotonirainy, Maire, Chandran (b0155) 2020; 8
Fan, Xu, Wu, Zheng, Tao (b0070) 2020; 8
Rastgoo, Nakisa, Maire, Rakotonirainy, Chandran (b0205) 2019; 138
pp. 81-84. IEEE.
Wroge, Özkanca, Demiroglu, Si, Atkins, Ghomi (b0285) 2018
Qureshi, S. A., Hasanuzzaman, M., Saha, S., Dias, G., 2019. The Verbal and Non Verbal Signals of Depression--Combining Acoustics, Text and Visuals for Estimating Depression Level.
Kroenke, Strine, Spitzer, Williams, Berry, Mokdad (b0100) 2009; 114
Wen, Zhang (b0275) 2018; 6
Zhang, Haddad, Nakisa, Rastgoo, Candido, Tjondronegoro, de Dear (b0305) 2017; 123
Lopez-de-Ipina, Martinez-de-Lizarduy, Calvo, Mekyska, Beitia, Barroso, Ecay-Torres (b0120) 2018; 15
Nogas, Khan, Mihailidis (b0175) 2020; 4
Lee, H., Kim, J., Kim, B., Kim, S., 2018, December. Convolutional Autoencoder Based Feature Extraction in Radar Data Analysis. In
Lemaître, Nogueira, Aridas (b0110) 2017; 18
Banan, Nasiri, Taheri-Garavand (b0015) 2020; 89
McIntyre, Göcke, Hyett, Green, Breakspear (b0145) 2009
Sahu, S., Gupta, R., Sivaraman, G., AbdAlmageed, W. and Espy-Wilson, C., 2018. Adversarial auto-encoders for speech based emotion recognition.
arXiv preprint arXiv:1904.07656.
arXiv preprint arXiv:2006.10417.
pp. 375-417.
Zhao, Dong, Chen, Iraji, Li, Makkie, Liu (b0315) 2017; 42
Bredin, Yin, Coria, Gelly, Korshunov, Lavechin, Gill (b0035) 2020
Lin, Tsai, Hu, Jhang (b0115) 2017; 409
Mathers, Loncar (b0135) 2006; 3
Chorowski, Weiss, Bengio, van den Oord (b0055) 2019; 27
Valstar, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., Scherer, S., Stratou, G., Cowie, R. and Pantic, M., 2016, October. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In
Ringeval, F., Schuller, B., Valstar, M., Cowie, R., Kaya, H., Schmitt, M., Amiriparian, S., Cummins, N., Lalanne, D., Michaud, A. and Çiftçi, E., 2018, October. AVEC 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. Proceedings of the 2018 on audio/visual emotion challenge and workshop, pp. 3-13.
Ma, Yang, Chen, Huang, Wang (b0125) 2016
Van Der Maaten, Postma, Van den Herik (b0255) 2009; 10
.
An, Cho (b0010) 2015; 2
Al Hanai, Ghassemi, Glass (b0005) 2018
Mou, Zhou, Zhao, Nakisa, Rastgoo, Jain, Gao (b0150) 2021; 173
Cummins, Scherer, Krajewski, Schnieder, Epps, Quatieri (b0060) 2015; 71
Nakisa, Rastgoo, Tjondronegoro, Chandran (b0165) 2018; 93
Ribeiro, A., Matos, L. M., Pereira, P. J., Nunes, E. C., Ferreira, A. L., Cortez, P., Pilastri, A., 2020. Deep Dense and Convolutional Autoencoders for Unsupervised Anomaly Detection in Machine Condition Sounds.
Palylyk-Colwell, Argáez (b0190) 2018
McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., Nieto, O., 2015, July. librosa: Audio and music signal analysis in python. Proceedings of the 14th python in science conference, Vol. 8, pp. 18- 25.
Nanni, Maguolo, Paci (b0170) 2020; 57
Zlotnik, A., Montero, J.M., San-Segundo, R. and Gallardo-Antolín, A., 2015. Random forest-based prediction of Parkinson's disease progression using acoustic, ASR and intelligibility features. INTERSPEECH-2015, 503- 507.
Cohn, J. F., Cummins, N., Epps, J., Goecke, R., Joshi, J. Scherer, S., 2018. Multimodal assessment of depression from behavioral signals. In
Ortiz-Rodriguez, J. M., Martinez-Blanco, M. R, Cervantes-Viramontes, J. M., Vega-Carrillo, H. R., 2013. Robust design of artificial neural networks methodology in neutron spectrometry. In Artificial Neural Networks – Architectures and Applications – Edition 1. Chapter 4, INTECH.
Pampouchidou, Simantiraki, Fazlollahi, Pediaditis, Manousos, Roniotis, Yang (b0195) 2016
Rastgoo, Nakisa, Rakotonirainy, Chandran, Tjondronegoro (b0210) 2018; 51
Vásquez-Correa, Arias-Vergara, Orozco-Arroyave, Eskofier, Klucken, Nöth (b0260) 2018; 23
Gosztolya, Vincze, Tóth, Pákáski, Kálmán, Hoffmann (b0085) 2019; 53
Nakisa, Rastgoo, Rakotonirainy, Maire, Chandran (b0160) 2018; 6
Vázquez-Romero, Gallardo-Antolín (b0265) 2020; 22
Yang, Sahli, Xia, Pei, Oveneke, Jiang (b0300) 2017
Shamshirband, Rabczuk, Chau (b0235) 2019; 7
Gogoi, Begum (b0080) 2017
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R. and Pantic, M., 2014, November. Avec 2014: 3d dimensional affect and depression recognition challenge. Proceedings of the 4th international workshop on audio/visual emotion challenge, pp. 3-10.
Ozkanca, Demiroglu, Besirli, Celik (b0185) 2018; 2018
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 3-9.
Demiroglu, C., Beşirli, A., Ozkanca, Y., Çelik, S., 2020. Depression-level assessment from multi-lingual conversational speech data using acoustic and text features. Journal on Audio, Speech, and Music Processing. 2020, 17 (2020). 10.1186/s13636-020-00182-4.
Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 3-10.
Venugopalan, Tong, Hassanzadeh, Wang (b0270) 2021; 11
Yang, Jiang, He, Pei, Oveneke, Sahli (b0295) 2016
Masci, Meier, Cireşan, Schmidhuber (b0130) 2011
Chollet, F. 2015. Keras. Available online at
Valstar, M., Schuller, B., Smith, K., Eyben, F., Jiang, B., Bilakhia, S., Schnieder, S., Cowie, R. and Pantic, M., 2013, October. Avec 2013: the continuous audio/visual emotion and depression recognition challenge. Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, pp. 3-10.
Gratch, J., Artstein, R., Lucas, G. M., Stratou, G., Scherer, S., Nazarian, A., Wood, R., Boberg, J., DeVault, D., Marsella, S., Traum, D. R., 2014, May. The distress analysis interview corpus of human and computer interviews. LREC, pp. 3123-3128.
Zhang, Shen, ud Din, Liu, Wang, Hu (b0310) 2019; 23
Ringeval, F., Schuller, B., Valstar, M., Gratch, J., Cowie, R., Scherer, S., Mozgai, S., Cummins, N., Schmitt, M., Pantic, M., 2017, October. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In
arXiv preprint arXiv:1806.02146.
Jolliffe, Cadima (b0095) 2016; 374
Balakrishnama, Ganapathiraju (b0020) 1998; 18
Xia, Liu (b0290) 2013
Braga, Madureira, Coelho, Ajith (b0030) 2019; 77
World Health Organization (WHO) (b0280) 2017
Ma (10.1016/j.eswa.2021.116076_b0125) 2016
Vásquez-Correa (10.1016/j.eswa.2021.116076_b0260) 2018; 23
Banan (10.1016/j.eswa.2021.116076_b0015) 2020; 89
Nakisa (10.1016/j.eswa.2021.116076_b0160) 2018; 6
Chorowski (10.1016/j.eswa.2021.116076_b0055) 2019; 27
Lopez-de-Ipina (10.1016/j.eswa.2021.116076_b0120) 2018; 15
Xia (10.1016/j.eswa.2021.116076_b0290) 2013
Zhang (10.1016/j.eswa.2021.116076_b0310) 2019; 23
Balakrishnama (10.1016/j.eswa.2021.116076_b0020) 1998; 18
Cummins (10.1016/j.eswa.2021.116076_b0060) 2015; 71
Yang (10.1016/j.eswa.2021.116076_b0295) 2016
10.1016/j.eswa.2021.116076_b0230
Yang (10.1016/j.eswa.2021.116076_b0300) 2017
Al Hanai (10.1016/j.eswa.2021.116076_b0005) 2018
Kroenke (10.1016/j.eswa.2021.116076_b0100) 2009; 114
Rastgoo (10.1016/j.eswa.2021.116076_b0210) 2018; 51
Van Der Maaten (10.1016/j.eswa.2021.116076_b0255) 2009; 10
Masci (10.1016/j.eswa.2021.116076_b0130) 2011
Nakisa (10.1016/j.eswa.2021.116076_b0165) 2018; 93
McIntyre (10.1016/j.eswa.2021.116076_b0145) 2009
Nakisa (10.1016/j.eswa.2021.116076_b0155) 2020; 8
Mou (10.1016/j.eswa.2021.116076_b0150) 2021; 173
Zhang (10.1016/j.eswa.2021.116076_b0305) 2017; 123
Pampouchidou (10.1016/j.eswa.2021.116076_b0195) 2016
10.1016/j.eswa.2021.116076_b0025
10.1016/j.eswa.2021.116076_b0105
10.1016/j.eswa.2021.116076_b0225
Gogoi (10.1016/j.eswa.2021.116076_b0080) 2017
10.1016/j.eswa.2021.116076_b0180
Rastgoo (10.1016/j.eswa.2021.116076_b0205) 2019; 138
Shamshirband (10.1016/j.eswa.2021.116076_b0235) 2019; 7
10.1016/j.eswa.2021.116076_b0140
Nanni (10.1016/j.eswa.2021.116076_b0170) 2020; 57
10.1016/j.eswa.2021.116076_b0220
10.1016/j.eswa.2021.116076_b0065
Lemaître (10.1016/j.eswa.2021.116076_b0110) 2017; 18
Lin (10.1016/j.eswa.2021.116076_b0115) 2017; 409
10.1016/j.eswa.2021.116076_b0090
Nogas (10.1016/j.eswa.2021.116076_b0175) 2020; 4
Venugopalan (10.1016/j.eswa.2021.116076_b0270) 2021; 11
Jolliffe (10.1016/j.eswa.2021.116076_b0095) 2016; 374
10.1016/j.eswa.2021.116076_b0215
Mathers (10.1016/j.eswa.2021.116076_b0135) 2006; 3
10.1016/j.eswa.2021.116076_b0050
10.1016/j.eswa.2021.116076_b0250
Wen (10.1016/j.eswa.2021.116076_b0275) 2018; 6
Vázquez-Romero (10.1016/j.eswa.2021.116076_b0265) 2020; 22
Wroge (10.1016/j.eswa.2021.116076_b0285) 2018
Fan (10.1016/j.eswa.2021.116076_b0070) 2020; 8
Bredin (10.1016/j.eswa.2021.116076_b0035) 2020
World Health Organization (WHO) (10.1016/j.eswa.2021.116076_b0280) 2017
Palylyk-Colwell (10.1016/j.eswa.2021.116076_b0190) 2018
Ozkanca (10.1016/j.eswa.2021.116076_b0185) 2018; 2018
10.1016/j.eswa.2021.116076_b0200
10.1016/j.eswa.2021.116076_b0045
10.1016/j.eswa.2021.116076_b0320
10.1016/j.eswa.2021.116076_b0245
Zhao (10.1016/j.eswa.2021.116076_b0315) 2017; 42
An (10.1016/j.eswa.2021.116076_b0010) 2015; 2
Giannakopoulos (10.1016/j.eswa.2021.116076_b0075) 2015; 10
10.1016/j.eswa.2021.116076_b0040
10.1016/j.eswa.2021.116076_b0240
Braga (10.1016/j.eswa.2021.116076_b0030) 2019; 77
Gosztolya (10.1016/j.eswa.2021.116076_b0085) 2019; 53
References_xml – reference: Sahu, S., Gupta, R., Sivaraman, G., AbdAlmageed, W. and Espy-Wilson, C., 2018. Adversarial auto-encoders for speech based emotion recognition.
– volume: 10
  year: 2015
  ident: b0075
  article-title: Pyaudioanalysis: An open-source python library for audio signal analysis
  publication-title: PloS one
– volume: 4
  start-page: 50
  year: 2020
  end-page: 70
  ident: b0175
  article-title: Deepfall: Non-invasive fall detection with deep spatio-temporal convolutional autoencoders
  publication-title: Journal of Healthcare Informatics Research
– volume: 8
  start-page: 25111
  year: 2020
  end-page: 25121
  ident: b0070
  article-title: Spatiotemporal modeling for nonlinear distributed thermal processes based on KL decomposition, MLP and LSTM network
  publication-title: IEEE Access
– reference: Cohn, J. F., Cummins, N., Epps, J., Goecke, R., Joshi, J. Scherer, S., 2018. Multimodal assessment of depression from behavioral signals. In
– volume: 23
  start-page: 1618
  year: 2018
  end-page: 1630
  ident: b0260
  article-title: Multimodal assessment of Parkinson's disease: A deep learning approach
  publication-title: IEEE journal of biomedical and health informatics
– start-page: 2886
  year: 2013
  end-page: 2889
  ident: b0290
  article-title: Using denoising autoencoder for emotion recognition
  publication-title: Interspeech
– volume: 18
  start-page: 1
  year: 1998
  end-page: 8
  ident: b0020
  article-title: Linear discriminant analysis-a brief tutorial
  publication-title: Institute for Signal and information Processing
– volume: 57
  year: 2020
  ident: b0170
  article-title: Data augmentation approaches for improving animal audio classification
  publication-title: Ecological Informatics
– volume: 7
  start-page: 164650
  year: 2019
  end-page: 164666
  ident: b0235
  article-title: A survey of deep learning techniques: Application in wind and solar energy resources
  publication-title: IEEE Access
– volume: 77
  start-page: 148
  year: 2019
  end-page: 158
  ident: b0030
  article-title: Automatic detection of Parkinson’s disease based on acoustic analysis of speech
  publication-title: Engineering Applications of Artificial Intelligence
– reference: Qureshi, S. A., Hasanuzzaman, M., Saha, S., Dias, G., 2019. The Verbal and Non Verbal Signals of Depression--Combining Acoustics, Text and Visuals for Estimating Depression Level.
– reference: pp. 81-84. IEEE.
– reference: Valstar, M., Schuller, B., Smith, K., Eyben, F., Jiang, B., Bilakhia, S., Schnieder, S., Cowie, R. and Pantic, M., 2013, October. Avec 2013: the continuous audio/visual emotion and depression recognition challenge. Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, pp. 3-10.
– volume: 11
  start-page: 1
  year: 2021
  end-page: 13
  ident: b0270
  article-title: Multimodal deep learning models for early detection of Alzheimer’s disease stage
  publication-title: Scientific Reports
– start-page: 1716
  year: 2018
  end-page: 1720
  ident: b0005
  article-title: September. Detecting Depression with Audio/Text Sequence Modeling of Interviews
  publication-title: Interspeech
– reference: Ortiz-Rodriguez, J. M., Martinez-Blanco, M. R, Cervantes-Viramontes, J. M., Vega-Carrillo, H. R., 2013. Robust design of artificial neural networks methodology in neutron spectrometry. In Artificial Neural Networks – Architectures and Applications – Edition 1. Chapter 4, INTECH.
– volume: 18
  start-page: 559
  year: 2017
  end-page: 563
  ident: b0110
  article-title: Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning
  publication-title: The Journal of Machine Learning Research
– volume: 374
  start-page: 20150202
  year: 2016
  ident: b0095
  article-title: Principal component analysis: A review and recent developments
  publication-title: Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
– volume: 15
  start-page: 139
  year: 2018
  end-page: 148
  ident: b0120
  article-title: Advances on automatic speech analysis for early detection of Alzheimer disease: A non-linear multi-task approach
  publication-title: Current Alzheimer Research
– volume: 8
  start-page: 225463
  year: 2020
  end-page: 225474
  ident: b0155
  article-title: Automatic Emotion Recognition Using Temporal Multimodal Deep Learning
  publication-title: IEEE Access
– year: 2017
  ident: b0280
  article-title: Depression and other common mental disorders: Global health estimates
– start-page: 27
  year: 2016
  end-page: 34
  ident: b0195
  article-title: Depression assessment by fusing high and low level features from audio, video, and text
– reference: Lee, H., Kim, J., Kim, B., Kim, S., 2018, December. Convolutional Autoencoder Based Feature Extraction in Radar Data Analysis. In
– reference: Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 3-10.
– start-page: 89
  year: 2016
  end-page: 96
  ident: b0295
  article-title: Decision tree based depression classification from audio video and language information
  publication-title: Proceedings of the 6th international workshop on audio/visual emotion challenge
– volume: 6
  start-page: 25399
  year: 2018
  end-page: 25410
  ident: b0275
  article-title: Deep convolution neural network and autoencoders-based unsupervised feature learning of EEG signals
  publication-title: IEEE Access
– reference: Valstar, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., Scherer, S., Stratou, G., Cowie, R. and Pantic, M., 2016, October. Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In
– start-page: 1
  year: 2017
  end-page: 5
  ident: b0080
  article-title: Image Classification Using Deep Autoencoders
  publication-title: IEEE International Conference on Computational Intelligence and Computing Research (ICCIC)
– volume: 53
  start-page: 181
  year: 2019
  end-page: 197
  ident: b0085
  article-title: Identifying mild cognitive impairment and mild Alzheimer’s disease based on spontaneous speech using ASR and linguistic features
  publication-title: Computer Speech & Language
– reference: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 3-9.
– volume: 93
  start-page: 143
  year: 2018
  end-page: 155
  ident: b0165
  article-title: Evolutionary computation algorithms for feature selection of EEG-based emotion recognition using mobile sensors
  publication-title: Expert Systems with Applications
– volume: 173
  year: 2021
  ident: b0150
  article-title: Driver stress detection via multimodal fusion using attention-based CNN-LSTM
  publication-title: Expert Systems with Applications
– start-page: 1
  year: 2009
  end-page: 8
  ident: b0145
  article-title: An approach for automatically measuring facial activity in depressed subjects
– reference: , pp. 375-417.
– reference: arXiv preprint arXiv:1904.07656.
– reference: Gratch, J., Artstein, R., Lucas, G. M., Stratou, G., Scherer, S., Nazarian, A., Wood, R., Boberg, J., DeVault, D., Marsella, S., Traum, D. R., 2014, May. The distress analysis interview corpus of human and computer interviews. LREC, pp. 3123-3128.
– volume: 51
  start-page: 1
  year: 2018
  end-page: 35
  ident: b0210
  article-title: A critical review of proactive detection of driver stress levels based on multimodal measurements
  publication-title: ACM Computing Surveys (CSUR)
– start-page: 1
  year: 2018
  end-page: 7
  ident: b0285
  article-title: Parkinson’s disease diagnosis using machine learning and voice
– volume: 10
  start-page: 13
  year: 2009
  ident: b0255
  article-title: Dimensionality reduction: A comparative
  publication-title: J Mach Learn Res
– reference: Chernykh, V., & Prikhodko, P. 2017. Emotion recognition from speech with recurrent neural networks.
– reference: Ribeiro, A., Matos, L. M., Pereira, P. J., Nunes, E. C., Ferreira, A. L., Cortez, P., Pilastri, A., 2020. Deep Dense and Convolutional Autoencoders for Unsupervised Anomaly Detection in Machine Condition Sounds.
– volume: 3
  year: 2006
  ident: b0135
  article-title: Projections of global mortality and burden of disease from 2002 to 2030
  publication-title: PLoS medicine
– volume: 123
  start-page: 176
  year: 2017
  end-page: 188
  ident: b0305
  article-title: The effects of higher temperature setpoints during summer on office workers' cognitive load and thermal comfort
  publication-title: Building and Environment
– start-page: 8
  year: 2018
  end-page: 9
  ident: b0190
  article-title: Telehealth for the Assessment and Treatment of Depression, Post-Traumatic Stress Disorder, and Anxiety: Clinical Evidence
  publication-title: Canadian Agency for Drugs and Technologies in Health
– start-page: 45
  year: 2017
  end-page: 51
  ident: b0300
  article-title: Hybrid depression classification and estimation from audio video and text information
  publication-title: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge
– volume: 2018
  start-page: 3398
  year: 2018
  end-page: 3402
  ident: b0185
  article-title: Multi-lingual depression-level assessment from conversational speech using acoustic and text features
  publication-title: Proceedings of Interspeech
– start-page: 7124
  year: 2020
  end-page: 7128
  ident: b0035
  article-title: Pyannote. audio: neural building blocks for speaker diarization
  publication-title: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– reference: Ringeval, F., Schuller, B., Valstar, M., Cowie, R., Kaya, H., Schmitt, M., Amiriparian, S., Cummins, N., Lalanne, D., Michaud, A. and Çiftçi, E., 2018, October. AVEC 2018 workshop and challenge: Bipolar disorder and cross-cultural affect recognition. Proceedings of the 2018 on audio/visual emotion challenge and workshop, pp. 3-13.
– reference: Demiroglu, C., Beşirli, A., Ozkanca, Y., Çelik, S., 2020. Depression-level assessment from multi-lingual conversational speech data using acoustic and text features. Journal on Audio, Speech, and Music Processing. 2020, 17 (2020). 10.1186/s13636-020-00182-4.
– volume: 2
  start-page: 1
  year: 2015
  end-page: 18
  ident: b0010
  article-title: Variational autoencoder based anomaly detection using reconstruction probability
  publication-title: Special Lecture on IE
– volume: 22
  start-page: 688
  year: 2020
  ident: b0265
  article-title: Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks
  publication-title: Entropy
– volume: 23
  start-page: 2265
  year: 2019
  end-page: 2275
  ident: b0310
  article-title: Multimodal depression detection: Fusion of electroencephalography and paralinguistic behaviors using a novel strategy for classifier ensemble
  publication-title: IEEE Journal of Biomedical and Health Informatics
– reference: Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R. and Pantic, M., 2014, November. Avec 2014: 3d dimensional affect and depression recognition challenge. Proceedings of the 4th international workshop on audio/visual emotion challenge, pp. 3-10.
– volume: 6
  start-page: 49325
  year: 2018
  end-page: 49338
  ident: b0160
  article-title: Long short term memory hyperparameter optimization for a neural network based emotion recognition framework
  publication-title: IEEE Access
– volume: 27
  start-page: 2041
  year: 2019
  end-page: 2053
  ident: b0055
  article-title: Unsupervised speech representation learning using wavenet autoencoders
  publication-title: IEEE/ACM transactions on audio, speech, and language processing
– volume: 71
  start-page: 10
  year: 2015
  end-page: 49
  ident: b0060
  article-title: A review of depression and suicide risk assessment using speech analysis
  publication-title: Speech Communication
– reference: Zlotnik, A., Montero, J.M., San-Segundo, R. and Gallardo-Antolín, A., 2015. Random forest-based prediction of Parkinson's disease progression using acoustic, ASR and intelligibility features. INTERSPEECH-2015, 503- 507.
– start-page: 35
  year: 2016
  end-page: 42
  ident: b0125
  article-title: Depaudionet: An efficient deep model for audio based depression classification
  publication-title: Proceedings of the 6th international workshop on audio/visual emotion challenge
– reference: Beck, A. T., Steer, R. A., Brown, G. K., 1996. Beck depression inventory.
– reference: arXiv preprint arXiv:2006.10417.
– reference: Ringeval, F., Schuller, B., Valstar, M., Gratch, J., Cowie, R., Scherer, S., Mozgai, S., Cummins, N., Schmitt, M., Pantic, M., 2017, October. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In
– reference: .
– volume: 114
  start-page: 163
  year: 2009
  end-page: 173
  ident: b0100
  article-title: The PHQ-8 as a measure of current depression in the general population
  publication-title: Journal of affective disorders
– volume: 89
  year: 2020
  ident: b0015
  article-title: Deep learning-based appearance features extraction for automated carp species identification
  publication-title: Aquacultural Engineering
– reference: Chollet, F. 2015. Keras. Available online at:
– start-page: 52
  year: 2011
  end-page: 59
  ident: b0130
  article-title: Stacked convolutional auto-encoders for hierarchical feature extraction
  publication-title: International conference on artificial neural networks
– volume: 409
  start-page: 17
  year: 2017
  end-page: 26
  ident: b0115
  article-title: Clustering-based undersampling in class-imbalanced data
  publication-title: Information Sciences
– reference: McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., Nieto, O., 2015, July. librosa: Audio and music signal analysis in python. Proceedings of the 14th python in science conference, Vol. 8, pp. 18- 25.
– volume: 42
  start-page: 200
  year: 2017
  end-page: 211
  ident: b0315
  article-title: Constructing fine-granularity functional brain network atlases via deep convolutional autoencoder
  publication-title: Medical Image Analysis
– reference: arXiv preprint arXiv:1806.02146.
– volume: 138
  year: 2019
  ident: b0205
  article-title: Automatic driver stress level classification using multimodal deep learning
  publication-title: Expert Systems with Applications
– volume: 11
  start-page: 1
  issue: 1
  year: 2021
  ident: 10.1016/j.eswa.2021.116076_b0270
  article-title: Multimodal deep learning models for early detection of Alzheimer’s disease stage
  publication-title: Scientific Reports
  doi: 10.1038/s41598-020-74399-w
– volume: 7
  start-page: 164650
  year: 2019
  ident: 10.1016/j.eswa.2021.116076_b0235
  article-title: A survey of deep learning techniques: Application in wind and solar energy resources
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2019.2951750
– volume: 23
  start-page: 1618
  issue: 4
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0260
  article-title: Multimodal assessment of Parkinson's disease: A deep learning approach
  publication-title: IEEE journal of biomedical and health informatics
  doi: 10.1109/JBHI.2018.2866873
– volume: 173
  year: 2021
  ident: 10.1016/j.eswa.2021.116076_b0150
  article-title: Driver stress detection via multimodal fusion using attention-based CNN-LSTM
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2021.114693
– start-page: 89
  year: 2016
  ident: 10.1016/j.eswa.2021.116076_b0295
  article-title: Decision tree based depression classification from audio video and language information
– ident: 10.1016/j.eswa.2021.116076_b0230
  doi: 10.21437/Interspeech.2017-1421
– start-page: 8
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0190
  article-title: Telehealth for the Assessment and Treatment of Depression, Post-Traumatic Stress Disorder, and Anxiety: Clinical Evidence
– volume: 71
  start-page: 10
  year: 2015
  ident: 10.1016/j.eswa.2021.116076_b0060
  article-title: A review of depression and suicide risk assessment using speech analysis
  publication-title: Speech Communication
  doi: 10.1016/j.specom.2015.03.004
– volume: 51
  start-page: 1
  issue: 5
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0210
  article-title: A critical review of proactive detection of driver stress levels based on multimodal measurements
  publication-title: ACM Computing Surveys (CSUR)
  doi: 10.1145/3186585
– volume: 15
  start-page: 139
  issue: 2
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0120
  article-title: Advances on automatic speech analysis for early detection of Alzheimer disease: A non-linear multi-task approach
  publication-title: Current Alzheimer Research
  doi: 10.2174/1567205014666171120143800
– volume: 138
  year: 2019
  ident: 10.1016/j.eswa.2021.116076_b0205
  article-title: Automatic driver stress level classification using multimodal deep learning
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2019.07.010
– ident: 10.1016/j.eswa.2021.116076_b0040
– volume: 22
  start-page: 688
  issue: 6
  year: 2020
  ident: 10.1016/j.eswa.2021.116076_b0265
  article-title: Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks
  publication-title: Entropy
  doi: 10.3390/e22060688
– volume: 77
  start-page: 148
  year: 2019
  ident: 10.1016/j.eswa.2021.116076_b0030
  article-title: Automatic detection of Parkinson’s disease based on acoustic analysis of speech
  publication-title: Engineering Applications of Artificial Intelligence
  doi: 10.1016/j.engappai.2018.09.018
– volume: 114
  start-page: 163
  issue: 1–3
  year: 2009
  ident: 10.1016/j.eswa.2021.116076_b0100
  article-title: The PHQ-8 as a measure of current depression in the general population
  publication-title: Journal of affective disorders
  doi: 10.1016/j.jad.2008.06.026
– ident: 10.1016/j.eswa.2021.116076_b0200
– ident: 10.1016/j.eswa.2021.116076_b0050
  doi: 10.1145/3107990.3108004
– ident: 10.1016/j.eswa.2021.116076_b0245
  doi: 10.1145/2661806.2661807
– volume: 53
  start-page: 181
  year: 2019
  ident: 10.1016/j.eswa.2021.116076_b0085
  article-title: Identifying mild cognitive impairment and mild Alzheimer’s disease based on spontaneous speech using ASR and linguistic features
  publication-title: Computer Speech & Language
  doi: 10.1016/j.csl.2018.07.007
– volume: 374
  start-page: 20150202
  issue: 2065
  year: 2016
  ident: 10.1016/j.eswa.2021.116076_b0095
  article-title: Principal component analysis: A review and recent developments
  publication-title: Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
  doi: 10.1098/rsta.2015.0202
– volume: 8
  start-page: 225463
  year: 2020
  ident: 10.1016/j.eswa.2021.116076_b0155
  article-title: Automatic Emotion Recognition Using Temporal Multimodal Deep Learning
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.3027026
– volume: 4
  start-page: 50
  issue: 1
  year: 2020
  ident: 10.1016/j.eswa.2021.116076_b0175
  article-title: Deepfall: Non-invasive fall detection with deep spatio-temporal convolutional autoencoders
  publication-title: Journal of Healthcare Informatics Research
  doi: 10.1007/s41666-019-00061-4
– ident: 10.1016/j.eswa.2021.116076_b0045
– start-page: 1
  year: 2009
  ident: 10.1016/j.eswa.2021.116076_b0145
  article-title: An approach for automatically measuring facial activity in depressed subjects
– ident: 10.1016/j.eswa.2021.116076_b0215
– ident: 10.1016/j.eswa.2021.116076_b0140
  doi: 10.25080/Majora-7b98e3ed-003
– volume: 89
  year: 2020
  ident: 10.1016/j.eswa.2021.116076_b0015
  article-title: Deep learning-based appearance features extraction for automated carp species identification
  publication-title: Aquacultural Engineering
  doi: 10.1016/j.aquaeng.2020.102053
– start-page: 35
  year: 2016
  ident: 10.1016/j.eswa.2021.116076_b0125
  article-title: Depaudionet: An efficient deep model for audio based depression classification
– volume: 18
  start-page: 1
  issue: 1998
  year: 1998
  ident: 10.1016/j.eswa.2021.116076_b0020
  article-title: Linear discriminant analysis-a brief tutorial
  publication-title: Institute for Signal and information Processing
– volume: 27
  start-page: 2041
  issue: 12
  year: 2019
  ident: 10.1016/j.eswa.2021.116076_b0055
  article-title: Unsupervised speech representation learning using wavenet autoencoders
  publication-title: IEEE/ACM transactions on audio, speech, and language processing
  doi: 10.1109/TASLP.2019.2938863
– year: 2017
  ident: 10.1016/j.eswa.2021.116076_b0280
– start-page: 45
  year: 2017
  ident: 10.1016/j.eswa.2021.116076_b0300
  article-title: Hybrid depression classification and estimation from audio video and text information
– ident: 10.1016/j.eswa.2021.116076_b0105
  doi: 10.1109/SCIS-ISIS.2018.00023
– volume: 10
  start-page: 13
  issue: 66–71
  year: 2009
  ident: 10.1016/j.eswa.2021.116076_b0255
  article-title: Dimensionality reduction: A comparative
  publication-title: J Mach Learn Res
– ident: 10.1016/j.eswa.2021.116076_b0220
  doi: 10.1145/3266302.3266316
– ident: 10.1016/j.eswa.2021.116076_b0225
  doi: 10.1145/3133944.3133953
– start-page: 1716
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0005
  article-title: September. Detecting Depression with Audio/Text Sequence Modeling of Interviews
  publication-title: Interspeech
– volume: 57
  year: 2020
  ident: 10.1016/j.eswa.2021.116076_b0170
  article-title: Data augmentation approaches for improving animal audio classification
  publication-title: Ecological Informatics
  doi: 10.1016/j.ecoinf.2020.101084
– volume: 409
  start-page: 17
  year: 2017
  ident: 10.1016/j.eswa.2021.116076_b0115
  article-title: Clustering-based undersampling in class-imbalanced data
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2017.05.008
– ident: 10.1016/j.eswa.2021.116076_b0180
– ident: 10.1016/j.eswa.2021.116076_b0250
  doi: 10.1145/2512530.2512533
– ident: 10.1016/j.eswa.2021.116076_b0025
  doi: 10.1037/t00742-000
– start-page: 52
  year: 2011
  ident: 10.1016/j.eswa.2021.116076_b0130
  article-title: Stacked convolutional auto-encoders for hierarchical feature extraction
– volume: 42
  start-page: 200
  year: 2017
  ident: 10.1016/j.eswa.2021.116076_b0315
  article-title: Constructing fine-granularity functional brain network atlases via deep convolutional autoencoder
  publication-title: Medical Image Analysis
  doi: 10.1016/j.media.2017.08.005
– ident: 10.1016/j.eswa.2021.116076_b0090
– volume: 10
  issue: 12
  year: 2015
  ident: 10.1016/j.eswa.2021.116076_b0075
  article-title: Pyaudioanalysis: An open-source python library for audio signal analysis
  publication-title: PloS one
  doi: 10.1371/journal.pone.0144610
– ident: 10.1016/j.eswa.2021.116076_b0240
  doi: 10.1145/2988257.2988258
– ident: 10.1016/j.eswa.2021.116076_b0065
  doi: 10.1186/s13636-020-00182-4
– volume: 23
  start-page: 2265
  issue: 6
  year: 2019
  ident: 10.1016/j.eswa.2021.116076_b0310
  article-title: Multimodal depression detection: Fusion of electroencephalography and paralinguistic behaviors using a novel strategy for classifier ensemble
  publication-title: IEEE Journal of Biomedical and Health Informatics
  doi: 10.1109/JBHI.2019.2938247
– start-page: 7124
  year: 2020
  ident: 10.1016/j.eswa.2021.116076_b0035
  article-title: Pyannote. audio: neural building blocks for speaker diarization
– volume: 93
  start-page: 143
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0165
  article-title: Evolutionary computation algorithms for feature selection of EEG-based emotion recognition using mobile sensors
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2017.09.062
– start-page: 1
  year: 2017
  ident: 10.1016/j.eswa.2021.116076_b0080
  article-title: Image Classification Using Deep Autoencoders
– start-page: 27
  year: 2016
  ident: 10.1016/j.eswa.2021.116076_b0195
  article-title: Depression assessment by fusing high and low level features from audio, video, and text
– start-page: 2886
  year: 2013
  ident: 10.1016/j.eswa.2021.116076_b0290
  article-title: Using denoising autoencoder for emotion recognition
  publication-title: Interspeech
– volume: 2
  start-page: 1
  issue: 1
  year: 2015
  ident: 10.1016/j.eswa.2021.116076_b0010
  article-title: Variational autoencoder based anomaly detection using reconstruction probability
  publication-title: Special Lecture on IE
– volume: 6
  start-page: 25399
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0275
  article-title: Deep convolution neural network and autoencoders-based unsupervised feature learning of EEG signals
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2833746
– volume: 2018
  start-page: 3398
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0185
  article-title: Multi-lingual depression-level assessment from conversational speech using acoustic and text features
  publication-title: Proceedings of Interspeech
– volume: 18
  start-page: 559
  issue: 1
  year: 2017
  ident: 10.1016/j.eswa.2021.116076_b0110
  article-title: Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning
  publication-title: The Journal of Machine Learning Research
– volume: 6
  start-page: 49325
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0160
  article-title: Long short term memory hyperparameter optimization for a neural network based emotion recognition framework
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2868361
– volume: 3
  issue: 11
  year: 2006
  ident: 10.1016/j.eswa.2021.116076_b0135
  article-title: Projections of global mortality and burden of disease from 2002 to 2030
  publication-title: PLoS medicine
  doi: 10.1371/journal.pmed.0030442
– volume: 123
  start-page: 176
  year: 2017
  ident: 10.1016/j.eswa.2021.116076_b0305
  article-title: The effects of higher temperature setpoints during summer on office workers' cognitive load and thermal comfort
  publication-title: Building and Environment
  doi: 10.1016/j.buildenv.2017.06.048
– ident: 10.1016/j.eswa.2021.116076_b0320
  doi: 10.21437/Interspeech.2015-184
– start-page: 1
  year: 2018
  ident: 10.1016/j.eswa.2021.116076_b0285
  article-title: Parkinson’s disease diagnosis using machine learning and voice
– volume: 8
  start-page: 25111
  year: 2020
  ident: 10.1016/j.eswa.2021.116076_b0070
  article-title: Spatiotemporal modeling for nonlinear distributed thermal processes based on KL decomposition, MLP and LSTM network
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.2970836
SSID ssj0017007
Score 2.631339
Snippet •A novel audio-based depression detection system using Convolutional Autoencoder.•Convolutional Autoencoder for extracting highly correlated and compact...
Depression is a serious and common psychological disorder that requires early diagnosis and treatment. In severe episodes the condition may result in suicidal...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 116076
SubjectTerms Artificial neural networks
Audio data
Audio depression detection
Classification
Convolutional Autoencoder
Early depression detection
Feature extraction
Machine learning
Mental disorders
Pattern recognition
Performance evaluation
Sampling methods
Semi-supervised learning
Title Audio based depression detection using Convolutional Autoencoder
URI https://dx.doi.org/10.1016/j.eswa.2021.116076
https://www.proquest.com/docview/2617690912
Volume 189
WOSCitedRecordID wos000717676900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Pb9MwFLZKx4ELvxEbA-WAuEyZEidx7BsV6gRoFIQ61JvlOO7WUpKuTcpu_Os8x47TTaOCA5eoTZM0eu_z8_Pze-9D6LUkRESJkD6WcerHdEp8GEXUl3GY4Qim3DwRDdlEOhrRyYR96fV-tbUwm0VaFPTqii3_q6rhHChbl87-g7rdQ-EEfAalwxHUDse_UvygzmflkZ6d8i7PVae8VsrQgtd2r7_Y2PfQWqqrUre0zG2y7txl6KlVZds9t4VwW1veLj4DOBOmZl2HmF2EWbunhttZXIiVcqHnr2JdnZdNlPZTeaGD5znY-dkP599_X9Qm4t3lD9vQBKxqXW6WiZe5mplv1-KOgInQUPMcK2N1aRr5JDVUiZ1ZZluGNbzV3JvIw_xYrX_qHlI4hBmABOktvbVHn_nJ2ekpHw8n4zfLS1_TjuntecvBcgft4TRhtI_2Bh-Gk49uIyoNTMV9-9a27sqkCN782z_5Njdm-cZ1GT9E9-2awxsYrDxCPVU8Rg9aPg_Pmvcn6G0DHa-BjtdBx3PQ8RroeNeg421B5yk6OxmO3733LcOGLyNMKz9LWJZH04SFGY2nMoqnDMsA1rgqprHAbJqHIhAM4yyAb0SlJAP_j0qSJXkcgef7DPWLslDPkRcJEdCUZXCZ1CwGMOJDTGSURIrCs8Q-ClvZcGnbz2sWlAVv8wznXMuTa3lyI899dOTuWZrmKzuvTlqRc-s-GreQA1x23nfY6ofbcbzmmqiAMHCm8cHun1-gex30D1G_WtXqJborN9VsvXpl4fQbmnOZ6w
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Audio+based+depression+detection+using+Convolutional+Autoencoder&rft.jtitle=Expert+systems+with+applications&rft.au=Sardari%2C+Sara&rft.au=Nakisa%2C+Bahareh&rft.au=Rastgoo%2C+Mohammed+Naim&rft.au=Eklund%2C+Peter&rft.date=2022-03-01&rft.pub=Elsevier+BV&rft.issn=0957-4174&rft.eissn=1873-6793&rft.volume=189&rft.spage=1&rft_id=info:doi/10.1016%2Fj.eswa.2021.116076&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon