Cross-modal emotion hashing network: Efficient binary coding for large-scale multimodal emotion retrieval

•Reframe multimodal emotion recognition as affect-oriented binary hashing with the Cross-Modal Emotion Hashing Network (CEHN).•Employ a three-hop gated cross-modal attention hierarchy to fuse audio, visual, and textual cues efficiently.•Use polarity-aware contrastive distillation to produce balanced...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Knowledge-based systems Ročník 331; s. 114771
Hlavní autori: Li, Chenhao, Huang, Wenti, Yang, Zhan, Long, Jun
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 03.12.2025
Predmet:
ISSN:0950-7051
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract •Reframe multimodal emotion recognition as affect-oriented binary hashing with the Cross-Modal Emotion Hashing Network (CEHN).•Employ a three-hop gated cross-modal attention hierarchy to fuse audio, visual, and textual cues efficiently.•Use polarity-aware contrastive distillation to produce balanced 32–128-bit codes whose Hamming distances preserve emotion similarity.•On MELD and CMU-MOSI, a 64-bit CEHN variant matches dense-vector baselines while shrinking memory up to 256× and cutting CPU latency to <3 ms.•Enable billion-scale, privacy-preserving retrieval on commodity hardware. Multimodal emotion recognition (MER) remains computationally heavy because speech, vision, and language features are high-dimensional. We reframe MER as affect-oriented retrieval and propose a Cross-Modal Emotion Hashing Network (CEHN), which encodes each utterance into compact binary codes that preserve fine-grained emotional semantics. CEHN couples pre-trained acoustic, visual, and textual encoders with gated cross-modal attention, and then applies a polarity-aware distillation loss that aligns continuous affect vectors with a temperature-controlled sign layer. The result is 32–128–bit hash codes whose Hamming distances approximate emotion similarity. On MELD and CMU-MOSI, CEHN matches or surpasses dense state-of-the-art models, improving F1 by up to 1.9 points while shrinking representation size by up to 16× and reducing CPU inference to <2 ms per utterance. Latency-matched quantization baselines trail by as much as 5 mAP points. Ablations confirm the critical roles of gated attention and polarity-aware distillation. By uniting efficient hashing with modality-aware fusion, CEHN enables real-time, privacy-preserving emotion understanding for dialogue systems, recommendation engines, and safety-critical moderation on resource-constrained devices.
AbstractList •Reframe multimodal emotion recognition as affect-oriented binary hashing with the Cross-Modal Emotion Hashing Network (CEHN).•Employ a three-hop gated cross-modal attention hierarchy to fuse audio, visual, and textual cues efficiently.•Use polarity-aware contrastive distillation to produce balanced 32–128-bit codes whose Hamming distances preserve emotion similarity.•On MELD and CMU-MOSI, a 64-bit CEHN variant matches dense-vector baselines while shrinking memory up to 256× and cutting CPU latency to <3 ms.•Enable billion-scale, privacy-preserving retrieval on commodity hardware. Multimodal emotion recognition (MER) remains computationally heavy because speech, vision, and language features are high-dimensional. We reframe MER as affect-oriented retrieval and propose a Cross-Modal Emotion Hashing Network (CEHN), which encodes each utterance into compact binary codes that preserve fine-grained emotional semantics. CEHN couples pre-trained acoustic, visual, and textual encoders with gated cross-modal attention, and then applies a polarity-aware distillation loss that aligns continuous affect vectors with a temperature-controlled sign layer. The result is 32–128–bit hash codes whose Hamming distances approximate emotion similarity. On MELD and CMU-MOSI, CEHN matches or surpasses dense state-of-the-art models, improving F1 by up to 1.9 points while shrinking representation size by up to 16× and reducing CPU inference to <2 ms per utterance. Latency-matched quantization baselines trail by as much as 5 mAP points. Ablations confirm the critical roles of gated attention and polarity-aware distillation. By uniting efficient hashing with modality-aware fusion, CEHN enables real-time, privacy-preserving emotion understanding for dialogue systems, recommendation engines, and safety-critical moderation on resource-constrained devices.
ArticleNumber 114771
Author Huang, Wenti
Yang, Zhan
Long, Jun
Li, Chenhao
Author_xml – sequence: 1
  givenname: Chenhao
  orcidid: 0009-0005-8903-0955
  surname: Li
  fullname: Li, Chenhao
  email: li0991xju@163.com
  organization: Computer Science and Technology, Xinjiang University, Urumqi, 830000, China
– sequence: 2
  givenname: Wenti
  surname: Huang
  fullname: Huang, Wenti
  organization: School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411100, China
– sequence: 3
  givenname: Zhan
  orcidid: 0000-0002-6336-0228
  surname: Yang
  fullname: Yang, Zhan
  organization: Big Data Institute, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
– sequence: 4
  givenname: Jun
  orcidid: 0000-0003-0163-0007
  surname: Long
  fullname: Long, Jun
  email: junlong@hnu.edu.cn
  organization: Big Data Institute, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
BookMark eNp9kL1OwzAUhT0UiRZ4Awa_QILtOHHCgISq8iNVYoHZcuzr1m1iIzsU9e1JFBYWpjuce47O-VZo4YMHhG4pySmh1d0hP_qQzilnhJU5pVwIukBL0pQkE6Skl2iV0oEQwhitl8itY0gp64NRHYY-DC54vFdp7_wOexi-Qzze4421TjvwA26dV_GMdTDTgw0RdyruIEtadYD7r25wf7MiDNHBSXXX6MKqLsHN771CH0-b9_VLtn17fl0_bjNNBRsyS1nFua6YMm3DTWVJWzPBFajGaLDCUmIVAC_Kmo5SXZKisJUirGmZKI0qrhCfc_W0LIKVn9H1Y2lJiZwQyYOcEckJkZwRjbaH2QZjt5ODKNO0WINxEfQgTXD_B_wA1Gh4UQ
Cites_doi 10.1016/j.patcog.2023.110079
10.1109/TPAMI.2018.2798607
10.1609/aaai.v39i2.32135
10.1007/s11042-024-19371-w
10.1109/TBDATA.2019.2921572
10.1109/JPROC.2017.2761740
10.1016/j.inffus.2017.02.003
10.1145/3610661.3616190
10.1037/h0077714
10.1016/j.knosys.2024.112599
10.1016/j.bspc.2025.108231
10.1145/3632527
10.1109/TPAMI.2010.57
10.1109/TAFFC.2024.3394873
ContentType Journal Article
Copyright 2025 Elsevier B.V.
Copyright_xml – notice: 2025 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.knosys.2025.114771
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_knosys_2025_114771
S095070512501809X
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
4.4
457
4G.
5VS
7-5
71M
77I
77K
8P~
9JN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AATTM
AAXKI
AAXUO
AAYFN
AAYWO
ABAOU
ABBOA
ABIVO
ABJNI
ABMAC
ACDAQ
ACGFS
ACLOT
ACRLP
ACVFH
ACZNC
ADBBV
ADCNI
ADEZE
ADGUI
ADTZH
AEBSH
AECPX
AEIPS
AEKER
AENEX
AEUPX
AFJKZ
AFPUW
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGII
AIIUN
AIKHN
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APXCP
ARUGR
AXJTR
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFKBS
EFLBG
EO8
EO9
EP2
EP3
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
JJJVA
KOM
MHUIS
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSW
SSZ
T5K
WH7
XPP
ZMT
~02
~G-
~HD
29L
9DU
AAQXK
AAYXX
ABDPE
ABWVN
ABXDB
ACNNM
ACRPL
ADJOM
ADMUD
ADNMO
AGQPQ
ASPBG
AVWKF
AZFZN
CITATION
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
LG9
LY7
M41
R2-
SBC
SET
UHS
WUQ
ID FETCH-LOGICAL-c172t-f12644c62adb94d6f0b8274aea9dcef7f10faee435810b885033f6a029b275da3
ISSN 0950-7051
IngestDate Sat Nov 29 06:54:20 EST 2025
Wed Dec 10 14:26:13 EST 2025
IsPeerReviewed true
IsScholarly true
Keywords Multimodal emotion recognition
Binary hashing
Cross-modal attention
Polarity-aware contrastive distillation
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c172t-f12644c62adb94d6f0b8274aea9dcef7f10faee435810b885033f6a029b275da3
ORCID 0009-0005-8903-0955
0000-0002-6336-0228
0000-0003-0163-0007
ParticipantIDs crossref_primary_10_1016_j_knosys_2025_114771
elsevier_sciencedirect_doi_10_1016_j_knosys_2025_114771
PublicationCentury 2000
PublicationDate 2025-12-03
PublicationDateYYYYMMDD 2025-12-03
PublicationDate_xml – month: 12
  year: 2025
  text: 2025-12-03
  day: 03
PublicationDecade 2020
PublicationTitle Knowledge-based systems
PublicationYear 2025
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Russell (bib0017) 1980; 39
Zhang, Lin, Jiang, Chen (bib0044) 2024; 83
Wang, Zhang, Zhang, Niu, Zhang, Sankaranarayana, Caldwell, Gedeon (bib0027) 2025
Zhang, Gu (bib0019) 2020
Shen, Huang, Lan, Zheng (bib0022) 2024
Deng, Yan (bib0040) 2019; 88
Ge, He, Ke, Sun (bib0047) 2013
Yang, Wang, Yi, Zhu, Zadeh, Poria, Morency (bib0005) 2020
Liu, Wang, Jiang, An, Gu, Li, Zhang (bib0029) 2024; 305
M. Shamsi, M. Tahon, Training Speech Emotion Classifier Without Categorical Annotations, arXiv
Zhang, Zhou, Feng, Lai, Li, Pan, Yin, Yan (bib0016) 2018
(2015).
Johnson, Douze, Jégou (bib0046) 2021; 7
Cao, Long, Wang, Huang, Yu (bib0015) 2017
Liu, Wang, Shan, Chen (bib0013) 2016
Cao, Chao, Liu (bib0043) 2025; 225
Poria, Cambria, Hussain, Huang (bib0003) 2017; 37
Cao, Long, Wang, Yu (bib0014) 2016
Poria, Hazarika, Majumder, Naik, Cambria, Mihalcea (bib0006) 2019
Song, Su, Huang, Yang (bib0038) 2024; 147
Sze, Chen, Yang, Emer (bib0008) 2017; 105
Kollias, Zafeiriou (bib0018) 2019
Baltrušaitis, Ahuja, Morency (bib0002) 2019; 41
A. Ispas, T. Deschamps-Berger, L. Devillers, A Multi-Task, Multi-Modal Approach for Predicting Categorical and Dimensional Emotions, arXiv
Johnson, Douze, Jégou (bib0035) 2019; 7
(2024).
Zaken, Ravfogel, Goldberg (bib0034) 2022
(2022).
Jégou, Douze, Schmid (bib0012) 2011; 33
Li, Zuo, Mei, Zhong, Meng (bib0036) 2016
Chen, Ren, Ma, Wang (bib0048) 2025
Picard (bib0001) 1997
G. Hinton, O. Vinyals, J. Dean, Distilling the Knowledge in a Neural Network, arXiv
Sun, Li (bib0041) 2024
Ding, Zhang, Li, Wu (bib0030) 2023
Tsai, Bai, Liang, Kolter, Morency, Salakhutdinov (bib0004) 2019
Zhang, Ma (bib0037) 2018
Li, Chen, Wang (bib0049) 2025; 25
Li, Zhang, Liu (bib0042) 2023; 17
Han, Liu, Wei, Zhou, Xu, Long (bib0039) 2024; 20
Jacob, Kligys, Chen, Zhu, Tang, Howard, Adam, Kalenichenko (bib0011) 2018
Song, Gu, Zhao, Dong (bib0025) 2021
Li, Chen, Zhang, Xu (bib0031) 2024
Johnson, Douze, Jégou (bib0007) 2021; 7
Shi, Tao, Jin, Yang, Yuan, Wang (bib0033) 2023; 202
Yan, Guo, Xing, Xu (bib0024) 2024; 15
Radford, Kim, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark, Krueger, Sutskever (bib0045) 2021
Yun, Park, Lee, Kim (bib0032) 2024
Liu, Wang, Gu, An, Zhao, Li, Zhang (bib0028) 2025; 110
Zhang, Chen, Chen (bib0023) 2023
Han, Pool, Tran, Dally (bib0009) 2015
Lu, Chen, Liang, Tan, Zeng, Hu (bib0026) 2025; 39
Liu (10.1016/j.knosys.2025.114771_bib0028) 2025; 110
Zhang (10.1016/j.knosys.2025.114771_bib0019) 2020
Cao (10.1016/j.knosys.2025.114771_bib0014) 2016
Wang (10.1016/j.knosys.2025.114771_bib0027) 2025
Li (10.1016/j.knosys.2025.114771_bib0031) 2024
Shen (10.1016/j.knosys.2025.114771_bib0022) 2024
Johnson (10.1016/j.knosys.2025.114771_bib0046) 2021; 7
Shi (10.1016/j.knosys.2025.114771_bib0033) 2023; 202
Radford (10.1016/j.knosys.2025.114771_bib0045) 2021
Jégou (10.1016/j.knosys.2025.114771_bib0012) 2011; 33
Cao (10.1016/j.knosys.2025.114771_bib0043) 2025; 225
Poria (10.1016/j.knosys.2025.114771_bib0003) 2017; 37
Zhang (10.1016/j.knosys.2025.114771_bib0023) 2023
Liu (10.1016/j.knosys.2025.114771_bib0029) 2024; 305
10.1016/j.knosys.2025.114771_bib0010
Liu (10.1016/j.knosys.2025.114771_bib0013) 2016
Ding (10.1016/j.knosys.2025.114771_bib0030) 2023
Yun (10.1016/j.knosys.2025.114771_bib0032) 2024
Han (10.1016/j.knosys.2025.114771_bib0039) 2024; 20
Li (10.1016/j.knosys.2025.114771_bib0042) 2023; 17
Tsai (10.1016/j.knosys.2025.114771_bib0004) 2019
Song (10.1016/j.knosys.2025.114771_bib0038) 2024; 147
Deng (10.1016/j.knosys.2025.114771_bib0040) 2019; 88
Sun (10.1016/j.knosys.2025.114771_bib0041) 2024
Song (10.1016/j.knosys.2025.114771_bib0025) 2021
Yan (10.1016/j.knosys.2025.114771_bib0024) 2024; 15
Ge (10.1016/j.knosys.2025.114771_bib0047) 2013
10.1016/j.knosys.2025.114771_bib0020
Zhang (10.1016/j.knosys.2025.114771_bib0044) 2024; 83
Han (10.1016/j.knosys.2025.114771_bib0009) 2015
10.1016/j.knosys.2025.114771_bib0021
Zaken (10.1016/j.knosys.2025.114771_bib0034) 2022
Yang (10.1016/j.knosys.2025.114771_sbref0005) 2020
Jacob (10.1016/j.knosys.2025.114771_bib0011) 2018
Picard (10.1016/j.knosys.2025.114771_bib0001) 1997
Chen (10.1016/j.knosys.2025.114771_bib0048) 2025
Zhang (10.1016/j.knosys.2025.114771_sbref0015) 2018
Baltrušaitis (10.1016/j.knosys.2025.114771_bib0002) 2019; 41
Kollias (10.1016/j.knosys.2025.114771_bib0018) 2019
Poria (10.1016/j.knosys.2025.114771_bib0006) 2019
Zhang (10.1016/j.knosys.2025.114771_bib0037) 2018
Lu (10.1016/j.knosys.2025.114771_bib0026) 2025; 39
Johnson (10.1016/j.knosys.2025.114771_bib0007) 2021; 7
Sze (10.1016/j.knosys.2025.114771_bib0008) 2017; 105
Russell (10.1016/j.knosys.2025.114771_bib0017) 1980; 39
Li (10.1016/j.knosys.2025.114771_bib0049) 2025; 25
Cao (10.1016/j.knosys.2025.114771_bib0015) 2017
Johnson (10.1016/j.knosys.2025.114771_bib0035) 2019; 7
Li (10.1016/j.knosys.2025.114771_bib0036) 2016
References_xml – volume: 305
  year: 2024
  ident: bib0029
  article-title: MAS-DGAT-Net: a dynamic graph attention network with multibranch feature extraction and staged fusion for EEG emotion recognition
  publication-title: Knowl. Based Syst.
– volume: 225
  year: 2025
  ident: bib0043
  article-title: Region-focused CNN with dynamic adaptive graph attention for EEG-based emotion recognition
  publication-title: Expert Syst. Appl.
– volume: 7
  start-page: 535
  year: 2021
  end-page: 547
  ident: bib0046
  article-title: Billion-scale similarity search with GPUs
  publication-title: IEEE Trans. Big Data
– start-page: 2740
  year: 2016
  end-page: 2749
  ident: bib0036
  article-title: Deep pairwise-supervised hashing for scalable image retrieval
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
– start-page: 347
  year: 2020
  end-page: 351
  ident: bib0019
  article-title: MTANet: facial affect recognition in the wild using multi-task learning convolutional network
  publication-title: Proc. IEEE FG Challenge & Workshop on Affective Behavior Analysis in the Wild
– start-page: 4512
  year: 2024
  end-page: 4523
  ident: bib0032
  article-title: TelME: teacher-leading multimodal fusion network for emotion recognition in conversation
  publication-title: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
– year: 2025
  ident: bib0048
  article-title: Semi-Supervised online cross-modal hashing
  publication-title: Proc. AAAI
– start-page: 1284
  year: 2024
  end-page: 1290
  ident: bib0031
  article-title: Contrastive transformer masked image hashing for efficient image retrieval
  publication-title: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
– volume: 17
  year: 2023
  ident: bib0042
  article-title: Dual attention mechanism graph convolutional neural network for EEG emotion recognition
  publication-title: Front. Neurosci.
– volume: 147
  year: 2024
  ident: bib0038
  article-title: Deep self-enhancement hashing for robust multi-label cross-modal retrieval
  publication-title: Pattern Recognit.
– volume: 41
  start-page: 423
  year: 2019
  end-page: 443
  ident: bib0002
  article-title: Multimodal machine learning: a survey and taxonomy
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– start-page: 2314
  year: 2023
  end-page: 2323
  ident: bib0030
  article-title: Multimodal sentiment analysis via efficient multimodal transformer
  publication-title: Proceedings of the ACM International Conference on Multimedia (ACM MM)
– year: 1997
  ident: bib0001
  article-title: Affective Computing
– volume: 105
  start-page: 2295
  year: 2017
  end-page: 2329
  ident: bib0008
  article-title: Efficient processing of deep neural networks: a tutorial and survey
  publication-title: Proc. IEEE
– start-page: 7395
  year: 2023
  end-page: 7408
  ident: bib0023
  article-title: DualGATs: dual graph attention networks for emotion recognition in conversations
  publication-title: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
– year: 2025
  ident: bib0027
  article-title: Visual and textual prompts in VLLMs for enhancing emotion recognition
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– volume: 33
  start-page: 117
  year: 2011
  end-page: 128
  ident: bib0012
  article-title: Product quantization for nearest neighbor search
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: A. Ispas, T. Deschamps-Berger, L. Devillers, A Multi-Task, Multi-Modal Approach for Predicting Categorical and Dimensional Emotions, arXiv:
– start-page: 1135
  year: 2015
  end-page: 1143
  ident: bib0009
  article-title: Learning both weights and connections for efficient neural networks
  publication-title: Proc. NeurIPS
– volume: 39
  start-page: 1447
  year: 2025
  end-page: 1455
  ident: bib0026
  article-title: Understanding emotional body expressions via large language models
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– start-page: 8748
  year: 2021
  end-page: 8763
  ident: bib0045
  article-title: Learning transferable visual models from natural language supervision
  publication-title: Proceedings of the 38th International Conference on Machine Learning (ICML)
– volume: 25
  start-page: 4821
  year: 2025
  end-page: 4832
  ident: bib0049
  article-title: DSSTNet: emotion recognition using multi-scale EEG features through graph convolution
  publication-title: IEEE Sens J.
– start-page: 2064
  year: 2016
  end-page: 2072
  ident: bib0013
  article-title: Deep supervised hashing for fast image retrieval
  publication-title: Proc. CVPR
– volume: 39
  start-page: 1161
  year: 1980
  end-page: 1178
  ident: bib0017
  article-title: A circumplex model of affect
  publication-title: J. Pers. Soc. Psychol.
– start-page: 585
  year: 2021
  end-page: 595
  ident: bib0025
  article-title: Supervised prototypical contrastive learning for imbalanced emotion recognition
  publication-title: Proc. ACL
– start-page: 2704
  year: 2018
  end-page: 2713
  ident: bib0011
  article-title: Quantization and training of neural networks for efficient integer-arithmetic-only inference
  publication-title: Proc. CVPR
– reference: (2022).
– volume: 110
  year: 2025
  ident: bib0028
  article-title: Cross-subject emotion recognition by EEG driven spatio-temporal hybrid network based on domain adaptation and dynamic graph attention
  publication-title: Biomed. Signal Process. Control
– year: 2024
  ident: bib0041
  article-title: HCH: hierarchical contrastive hashing for emotion video retrieval
  publication-title: IJCAI
– start-page: 527
  year: 2019
  end-page: 536
  ident: bib0006
  article-title: MELD: a multimodal multi-party dataset for emotion recognition in conversations
  publication-title: Proc. ACL
– reference: G. Hinton, O. Vinyals, J. Dean, Distilling the Knowledge in a Neural Network, arXiv:
– volume: 202
  start-page: 31292
  year: 2023
  end-page: 31311
  ident: bib0033
  article-title: UPop: unified and progressive pruning for compressing vision-language transformers
  publication-title: Proceedings of the 40Th International Conference on Machine Learning
– start-page: 1
  year: 2022
  end-page: 9
  ident: bib0034
  article-title: BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models
  publication-title: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
– volume: 20
  year: 2024
  ident: bib0039
  article-title: Supervised hierarchical online hashing for cross-modal retrieval
  publication-title: ACM Trans. Multimedia Comput. Commun. Appl.
– reference: (2015).
– start-page: 2946
  year: 2013
  end-page: 2953
  ident: bib0047
  article-title: Optimized product quantization for approximate nearest neighbor search
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
– start-page: 6558
  year: 2019
  end-page: 6569
  ident: bib0004
  article-title: Multimodal transformer for unaligned multimodal language sequences
  publication-title: Proc. ACL
– reference: (2024).
– start-page: 430
  year: 2016
  end-page: 438
  ident: bib0014
  article-title: Correlation hashing network for efficient cross-modal retrieval
  publication-title: Proc. CVPR
– volume: 15
  start-page: 2042
  year: 2024
  end-page: 2054
  ident: bib0024
  article-title: Bridge graph attention based graph convolution network with multi-scale transformer for EEG emotion recognition
  publication-title: IEEE Trans Affect Comput
– start-page: 5609
  year: 2017
  end-page: 5618
  ident: bib0015
  article-title: HashNet: deep learning to hash by continuation
  publication-title: Proc. ICCV
– volume: 88
  start-page: 102
  year: 2019
  end-page: 112
  ident: bib0040
  article-title: HashEmotion: compact binary codes for facial valence-arousal retrieval
  publication-title: Image Vis. Comput.
– year: 2020
  ident: bib0005
  article-title: MTAG: modal-temporal attention graph for unaligned human multimodal language sequences
  publication-title: Proc. EMNLP
– year: 2018
  ident: bib0037
  article-title: SCH-GAN: semi-supervised cross-modal hashing via generative adversarial networks
  publication-title: ACM Multimedia
– volume: 7
  start-page: 535
  year: 2019
  end-page: 547
  ident: bib0035
  article-title: Billion-scale similarity search with GPUs
  publication-title: IEEE Trans. Big Data
– volume: 37
  start-page: 98
  year: 2017
  end-page: 125
  ident: bib0003
  article-title: Multimodal sentiment analysis: a review
  publication-title: Inf. Fusion
– volume: 7
  start-page: 535
  year: 2021
  end-page: 547
  ident: bib0007
  article-title: Billion-scale similarity search with GPUs
  publication-title: IEEE Trans. Big Data
– reference: M. Shamsi, M. Tahon, Training Speech Emotion Classifier Without Categorical Annotations, arXiv:
– start-page: 1227
  year: 2024
  end-page: 1235
  ident: bib0022
  article-title: Contrastive transformer cross-modal hashing for video-text retrieval
  publication-title: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24
– year: 2018
  ident: bib0016
  article-title: HashGAN: attention-aware deep adversarial hashing for cross-modal retrieval
  publication-title: Proc. ECCV
– start-page: 2702
  year: 2019
  end-page: 2708
  ident: bib0018
  article-title: Aff-wild2: extending the aff-wild database for affect recognition
  publication-title: Proc. IEEE International Conference on Computer Vision Workshops (ICCVW)
– volume: 83
  start-page: 90487
  year: 2024
  end-page: 90509
  ident: bib0044
  article-title: Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval
  publication-title: Multimed. Tools Appl.
– start-page: 1284
  year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0031
  article-title: Contrastive transformer masked image hashing for efficient image retrieval
  publication-title: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
– start-page: 430
  year: 2016
  ident: 10.1016/j.knosys.2025.114771_bib0014
  article-title: Correlation hashing network for efficient cross-modal retrieval
– volume: 147
  year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0038
  article-title: Deep self-enhancement hashing for robust multi-label cross-modal retrieval
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2023.110079
– volume: 41
  start-page: 423
  issue: 2
  year: 2019
  ident: 10.1016/j.knosys.2025.114771_bib0002
  article-title: Multimodal machine learning: a survey and taxonomy
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2018.2798607
– year: 2025
  ident: 10.1016/j.knosys.2025.114771_bib0048
  article-title: Semi-Supervised online cross-modal hashing
– start-page: 1135
  year: 2015
  ident: 10.1016/j.knosys.2025.114771_bib0009
  article-title: Learning both weights and connections for efficient neural networks
– volume: 39
  start-page: 1447
  issue: 2
  year: 2025
  ident: 10.1016/j.knosys.2025.114771_bib0026
  article-title: Understanding emotional body expressions via large language models
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
  doi: 10.1609/aaai.v39i2.32135
– year: 2018
  ident: 10.1016/j.knosys.2025.114771_bib0037
  article-title: SCH-GAN: semi-supervised cross-modal hashing via generative adversarial networks
– year: 2018
  ident: 10.1016/j.knosys.2025.114771_sbref0015
  article-title: HashGAN: attention-aware deep adversarial hashing for cross-modal retrieval
– volume: 83
  start-page: 90487
  year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0044
  article-title: Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval
  publication-title: Multimed. Tools Appl.
  doi: 10.1007/s11042-024-19371-w
– volume: 7
  start-page: 535
  issue: 3
  year: 2021
  ident: 10.1016/j.knosys.2025.114771_bib0046
  article-title: Billion-scale similarity search with GPUs
  publication-title: IEEE Trans. Big Data
  doi: 10.1109/TBDATA.2019.2921572
– start-page: 585
  year: 2021
  ident: 10.1016/j.knosys.2025.114771_bib0025
  article-title: Supervised prototypical contrastive learning for imbalanced emotion recognition
– start-page: 4512
  year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0032
  article-title: TelME: teacher-leading multimodal fusion network for emotion recognition in conversation
  publication-title: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
– ident: 10.1016/j.knosys.2025.114771_bib0010
– volume: 7
  start-page: 535
  issue: 3
  year: 2021
  ident: 10.1016/j.knosys.2025.114771_bib0007
  article-title: Billion-scale similarity search with GPUs
  publication-title: IEEE Trans. Big Data
  doi: 10.1109/TBDATA.2019.2921572
– start-page: 527
  year: 2019
  ident: 10.1016/j.knosys.2025.114771_bib0006
  article-title: MELD: a multimodal multi-party dataset for emotion recognition in conversations
– volume: 25
  start-page: 4821
  issue: 6
  year: 2025
  ident: 10.1016/j.knosys.2025.114771_bib0049
  article-title: DSSTNet: emotion recognition using multi-scale EEG features through graph convolution
  publication-title: IEEE Sens J.
– ident: 10.1016/j.knosys.2025.114771_bib0021
– start-page: 2704
  year: 2018
  ident: 10.1016/j.knosys.2025.114771_bib0011
  article-title: Quantization and training of neural networks for efficient integer-arithmetic-only inference
– volume: 105
  start-page: 2295
  issue: 12
  year: 2017
  ident: 10.1016/j.knosys.2025.114771_bib0008
  article-title: Efficient processing of deep neural networks: a tutorial and survey
  publication-title: Proc. IEEE
  doi: 10.1109/JPROC.2017.2761740
– start-page: 5609
  year: 2017
  ident: 10.1016/j.knosys.2025.114771_bib0015
  article-title: HashNet: deep learning to hash by continuation
– volume: 37
  start-page: 98
  year: 2017
  ident: 10.1016/j.knosys.2025.114771_bib0003
  article-title: Multimodal sentiment analysis: a review
  publication-title: Inf. Fusion
  doi: 10.1016/j.inffus.2017.02.003
– start-page: 7395
  year: 2023
  ident: 10.1016/j.knosys.2025.114771_bib0023
  article-title: DualGATs: dual graph attention networks for emotion recognition in conversations
– ident: 10.1016/j.knosys.2025.114771_bib0020
  doi: 10.1145/3610661.3616190
– year: 1997
  ident: 10.1016/j.knosys.2025.114771_bib0001
– volume: 39
  start-page: 1161
  issue: 6
  year: 1980
  ident: 10.1016/j.knosys.2025.114771_bib0017
  article-title: A circumplex model of affect
  publication-title: J. Pers. Soc. Psychol.
  doi: 10.1037/h0077714
– volume: 7
  start-page: 535
  issue: 3
  year: 2019
  ident: 10.1016/j.knosys.2025.114771_bib0035
  article-title: Billion-scale similarity search with GPUs
  publication-title: IEEE Trans. Big Data
  doi: 10.1109/TBDATA.2019.2921572
– volume: 305
  year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0029
  article-title: MAS-DGAT-Net: a dynamic graph attention network with multibranch feature extraction and staged fusion for EEG emotion recognition
  publication-title: Knowl. Based Syst.
  doi: 10.1016/j.knosys.2024.112599
– volume: 110
  year: 2025
  ident: 10.1016/j.knosys.2025.114771_bib0028
  article-title: Cross-subject emotion recognition by EEG driven spatio-temporal hybrid network based on domain adaptation and dynamic graph attention
  publication-title: Biomed. Signal Process. Control
  doi: 10.1016/j.bspc.2025.108231
– start-page: 347
  year: 2020
  ident: 10.1016/j.knosys.2025.114771_bib0019
  article-title: MTANet: facial affect recognition in the wild using multi-task learning convolutional network
– start-page: 1
  year: 2022
  ident: 10.1016/j.knosys.2025.114771_bib0034
  article-title: BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models
– volume: 20
  issue: 4
  year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0039
  article-title: Supervised hierarchical online hashing for cross-modal retrieval
  publication-title: ACM Trans. Multimedia Comput. Commun. Appl.
  doi: 10.1145/3632527
– volume: 88
  start-page: 102
  year: 2019
  ident: 10.1016/j.knosys.2025.114771_bib0040
  article-title: HashEmotion: compact binary codes for facial valence-arousal retrieval
  publication-title: Image Vis. Comput.
– year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0041
  article-title: HCH: hierarchical contrastive hashing for emotion video retrieval
– start-page: 2946
  year: 2013
  ident: 10.1016/j.knosys.2025.114771_bib0047
  article-title: Optimized product quantization for approximate nearest neighbor search
– volume: 202
  start-page: 31292
  year: 2023
  ident: 10.1016/j.knosys.2025.114771_bib0033
  article-title: UPop: unified and progressive pruning for compressing vision-language transformers
– volume: 33
  start-page: 117
  issue: 1
  year: 2011
  ident: 10.1016/j.knosys.2025.114771_bib0012
  article-title: Product quantization for nearest neighbor search
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2010.57
– start-page: 2064
  year: 2016
  ident: 10.1016/j.knosys.2025.114771_bib0013
  article-title: Deep supervised hashing for fast image retrieval
– start-page: 8748
  year: 2021
  ident: 10.1016/j.knosys.2025.114771_bib0045
  article-title: Learning transferable visual models from natural language supervision
– start-page: 2314
  year: 2023
  ident: 10.1016/j.knosys.2025.114771_bib0030
  article-title: Multimodal sentiment analysis via efficient multimodal transformer
  publication-title: Proceedings of the ACM International Conference on Multimedia (ACM MM)
– start-page: 6558
  year: 2019
  ident: 10.1016/j.knosys.2025.114771_bib0004
  article-title: Multimodal transformer for unaligned multimodal language sequences
– volume: 17
  year: 2023
  ident: 10.1016/j.knosys.2025.114771_bib0042
  article-title: Dual attention mechanism graph convolutional neural network for EEG emotion recognition
  publication-title: Front. Neurosci.
– volume: 15
  start-page: 2042
  issue: 4
  year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0024
  article-title: Bridge graph attention based graph convolution network with multi-scale transformer for EEG emotion recognition
  publication-title: IEEE Trans Affect Comput
  doi: 10.1109/TAFFC.2024.3394873
– start-page: 2740
  year: 2016
  ident: 10.1016/j.knosys.2025.114771_bib0036
  article-title: Deep pairwise-supervised hashing for scalable image retrieval
– volume: 225
  year: 2025
  ident: 10.1016/j.knosys.2025.114771_bib0043
  article-title: Region-focused CNN with dynamic adaptive graph attention for EEG-based emotion recognition
  publication-title: Expert Syst. Appl.
– start-page: 1227
  year: 2024
  ident: 10.1016/j.knosys.2025.114771_bib0022
  article-title: Contrastive transformer cross-modal hashing for video-text retrieval
– start-page: 2702
  year: 2019
  ident: 10.1016/j.knosys.2025.114771_bib0018
  article-title: Aff-wild2: extending the aff-wild database for affect recognition
– year: 2020
  ident: 10.1016/j.knosys.2025.114771_sbref0005
  article-title: MTAG: modal-temporal attention graph for unaligned human multimodal language sequences
– year: 2025
  ident: 10.1016/j.knosys.2025.114771_bib0027
  article-title: Visual and textual prompts in VLLMs for enhancing emotion recognition
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
SSID ssj0002218
Score 2.4404392
Snippet •Reframe multimodal emotion recognition as affect-oriented binary hashing with the Cross-Modal Emotion Hashing Network (CEHN).•Employ a three-hop gated...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 114771
SubjectTerms Binary hashing
Cross-modal attention
Multimodal emotion recognition
Polarity-aware contrastive distillation
Title Cross-modal emotion hashing network: Efficient binary coding for large-scale multimodal emotion retrieval
URI https://dx.doi.org/10.1016/j.knosys.2025.114771
Volume 331
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0950-7051
  databaseCode: AIEXJ
  dateStart: 19950201
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0002218
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT-MwELbK48CF565gecgHbshVkqZxsjeEingJcWBFb5Ed26I8HFRKxXF_-o7jR1tYIThwiSonmVierzOTyecZhPYpZ6zLJCdpR0iScgZ_qZylJKtiCd6q6HIqmmYT9PIy7_eLq1brr98LM36gWuevr8XTt6oaxkDZZuvsF9QdhMIA_AalwxHUDsdPKf7I-D3yWAtYfGmb9Bzc2pZJB9qSvk0aoNfUjjBMAG635Fa18KzKB0MPJ8-gPmkZh7PShk0XrrGbiItsz31yjhjHKFyJ6BCxXwzsx32pb1k9wZLLVt8YzlIwQG7Q5LLD7Z45_KKn8xRJt-F8dGYSjhGhkSsv62yv365lrSe8m1HbkOWdYbc5hrv2va5h_m3zgPbk8tk62m_8W2AdekLbXWmllEZKaaXMoYUEoAmmfeHwtNc_C948SZoccZi9337ZcATfz-b_4c1UyHK9ipbduwY-tBhZQy2p19GK7-OBnVnfQIMpyGCnZOwggx1kfuMAGGwBgy1gMAAGTwEGTwATZAXA_EB_jnvXRyfEteAgFUS2I6JiEzBXWcIEL1KRqYjnCU2ZZIWopKIqjhSTMjVV9OBUbj6Kq4xFScFhOQXr_ETzutZyE2HBVRZzuFZS8M4p51UnjpXIc5HybpGoLUT8ypVPttJK-ZHGthD1y1u6aNFGgSVg5sM7f33xSdtoaQLoHTQ_Gr7IXbRYjUeD5-GeA8w_Y3uS4Q
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Cross-modal+emotion+hashing+network%3A+Efficient+binary+coding+for+large-scale+multimodal+emotion+retrieval&rft.jtitle=Knowledge-based+systems&rft.au=Li%2C+Chenhao&rft.au=Huang%2C+Wenti&rft.au=Yang%2C+Zhan&rft.au=Long%2C+Jun&rft.date=2025-12-03&rft.issn=0950-7051&rft.volume=331&rft.spage=114771&rft_id=info:doi/10.1016%2Fj.knosys.2025.114771&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_knosys_2025_114771
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-7051&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-7051&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-7051&client=summon