STAT: Spatial-Temporal Attention Mechanism for Video Captioning

Video captioning refers to automatic generate natural language sentences, which summarize the video contents. Inspired by the visual attention mechanism of human beings, temporal attention mechanism has been widely used in video description to selectively focus on important frames. However, most exi...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia Vol. 22; no. 1; pp. 229 - 241
Main Authors:	Yan, Chenggang, Tu, Yunbin, Wang, Xingzheng, Zhang, Yongbing, Hao, Xinhong, Zhang, Yongdong, Dai, Qionghai
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 01.01.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Cider Coders Convolutional neural networks Decoding encoder-decoder neural networks Encoders-Decoders Feature extraction Fuses Neural networks Performance evaluation Semantics Sentences spatial-temporal attention mechanism Video captioning Video data Visualization
ISSN:	1520-9210, 1941-0077
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!