Self-Attention (SA)-ConvLSTM Encoder–Decoder Structure-Based Video Prediction for Dynamic Motion Estimation

Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address the problems associated with spatiotemporal prediction, which is an important decis...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences Jg. 14; H. 23; S. 11315
Hauptverfasser:	Kim, Jeongdae, Choo, Hyunseung, Jeong, Jongpil
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Basel MDPI AG 01.12.2024
Schlagworte:	Computational linguistics Datasets encoder–decoder Forecasts and trends Language processing Natural language interfaces Neural networks Performance evaluation SA-ConvLSTM self-attention memory module spatiotemporal video prediction
ISSN:	2076-3417, 2076-3417
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Video prediction, which is the task of predicting future video frames based on past observations, remains a challenging problem because of the complexity and high dimensionality of spatiotemporal dynamics. To address the problems associated with spatiotemporal prediction, which is an important decision-making tool in various fields, several deep learning models have been proposed. Convolutional long short-term memory (ConvLSTM) can capture space and time simultaneously and has shown excellent performance in various applications, such as image and video prediction, object detection, and semantic segmentation. However, ConvLSTM has limitations in capturing long-term temporal dependencies. To solve this problem, this study proposes an encoder–decoder structure using self-attention ConvLSTM (SA-ConvLSTM), which retains the advantages of ConvLSTM and effectively captures the long-range dependencies through the self-attention mechanism. The effectiveness of the encoder–decoder structure using SA-ConvLSTM was validated through experiments on the MovingMNIST, KTH dataset.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app142311315