Video Summarization With Attention-Based Encoder-Decoder Networks

This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, and the output is a keyshot sequence. Our key idea is to learn a deep summarization network with attention mechanis...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems for video technology Vol. 30; no. 6; pp. 1709 - 1717
Main Authors:	Ji, Zhong, Xiong, Kailin, Pang, Yanwei, Li, Xuelong
Format:	Journal Article
Language:	English
Published:	New York IEEE 01.06.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Additives attention mechanism Coders Datasets Decoding encoder-decoder Frames (data processing) Indexes Internet LSTM Networks Recurrent neural networks Semantics Video data Video summarization Visualization
ISSN:	1051-8215, 1558-2205
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, and the output is a keyshot sequence. Our key idea is to learn a deep summarization network with attention mechanism to mimic the way of selecting the keyshots of human. To this end, we propose a novel video summarization framework named attentive encoder-decoder networks for video summarization (AVS), in which the encoder uses a bidirectional long short-term memory (BiLSTM) to encode the contextual information among the input video frames. As for the decoder, two attention-based LSTM networks are explored by using additive and multiplicative objective functions, respectively. Extensive experiments are conducted on two video summarization benchmark datasets, i.e., SumMe and TVSum. The results demonstrate the superiority of the proposed AVS-based approaches against the state-of-the-art approaches, with remarkable improvements on both datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2019.2904996