KEMoS: A knowledge-enhanced multi-modal summarizing framework for Chinese online meetings

The demand for “online meetings” and “collaborative office work” keeps surging recently, producing an abundant amount of relevant data. How to provide participants with accurate and fast summarizing service has attracted extensive attention. Existing meeting summarizing models overlook the utilizati...

Full description

Saved in:

Bibliographic Details
Published in:	Neural networks Vol. 178; p. 106417
Main Authors:	Qi, Peng, Sun, Yan, Yao, Muyan, Tao, Dan
Format:	Journal Article
Language:	English
Published:	United States Elsevier Ltd 01.10.2024
Subjects:	Algorithms China Cluster Analysis East Asian People Humans Knowledge Multi-modal enhanced encoding strategy Multi-modal meeting knowledge graph Neural Networks, Computer Semantics Topic-based hierarchical clustering approach Topic-enhanced decoding strategy Videoconferencing China Topic-enhanced decoding strategy Multi-modal enhanced encoding strategy Multi-modal meeting knowledge graph Topic-based hierarchical clustering approach
ISSN:	0893-6080, 1879-2782, 1879-2782
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The demand for “online meetings” and “collaborative office work” keeps surging recently, producing an abundant amount of relevant data. How to provide participants with accurate and fast summarizing service has attracted extensive attention. Existing meeting summarizing models overlook the utilization of multi-modal information and the information offsetting during summarizing. In this paper, we develop a knowledge-enhanced multi-modal summarizing framework. Firstly, we construct a three-layer multi-modal meeting knowledge graph, including basic, knowledge, and multi-modal layer, to integrate meeting information thoroughly. Then, we raise a topic-based hierarchical clustering approach, which considers information entropy and difference simultaneously, to capture the semantic evolution of meetings. Next, we devise a multi-modal enhanced encoding strategy, including a sentence-level cross-modal encoder, a joint loss function, and a knowledge graph embedding module, to learn the meeting and topic-level presentations. Finally, when generating summaries, we design a topic-enhanced decoding strategy for the Transformer decoder which mitigates semantic offsetting with the aid of topic information. Extensive experiments show that our proposed work consistently outperforms state-of-the-art solutions on the Chinese meeting dataset, where the ROUGE-1, ROUGE-2, and ROUGE-L are 49.98%, 21.03%, and 32.03% respectively.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0893-6080 1879-2782 1879-2782
DOI:	10.1016/j.neunet.2024.106417