KEMoS: A knowledge-enhanced multi-modal summarizing framework for Chinese online meetings

The demand for “online meetings” and “collaborative office work” keeps surging recently, producing an abundant amount of relevant data. How to provide participants with accurate and fast summarizing service has attracted extensive attention. Existing meeting summarizing models overlook the utilizati...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Neural networks Ročník 178; s. 106417
Hlavní autori:	Qi, Peng, Sun, Yan, Yao, Muyan, Tao, Dan
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States Elsevier Ltd 01.10.2024
Predmet:	Algorithms China Cluster Analysis East Asian People Humans Knowledge Multi-modal enhanced encoding strategy Multi-modal meeting knowledge graph Neural Networks, Computer Semantics Topic-based hierarchical clustering approach Topic-enhanced decoding strategy Videoconferencing China Topic-enhanced decoding strategy Multi-modal enhanced encoding strategy Multi-modal meeting knowledge graph Topic-based hierarchical clustering approach
ISSN:	0893-6080, 1879-2782, 1879-2782
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	The demand for “online meetings” and “collaborative office work” keeps surging recently, producing an abundant amount of relevant data. How to provide participants with accurate and fast summarizing service has attracted extensive attention. Existing meeting summarizing models overlook the utilization of multi-modal information and the information offsetting during summarizing. In this paper, we develop a knowledge-enhanced multi-modal summarizing framework. Firstly, we construct a three-layer multi-modal meeting knowledge graph, including basic, knowledge, and multi-modal layer, to integrate meeting information thoroughly. Then, we raise a topic-based hierarchical clustering approach, which considers information entropy and difference simultaneously, to capture the semantic evolution of meetings. Next, we devise a multi-modal enhanced encoding strategy, including a sentence-level cross-modal encoder, a joint loss function, and a knowledge graph embedding module, to learn the meeting and topic-level presentations. Finally, when generating summaries, we design a topic-enhanced decoding strategy for the Transformer decoder which mitigates semantic offsetting with the aid of topic information. Extensive experiments show that our proposed work consistently outperforms state-of-the-art solutions on the Chinese meeting dataset, where the ROUGE-1, ROUGE-2, and ROUGE-L are 49.98%, 21.03%, and 32.03% respectively.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0893-6080 1879-2782 1879-2782
DOI:	10.1016/j.neunet.2024.106417