Time-Lag Aware Multi-Modal Variational Autoencoder Using Baseball Videos And Tweets For Prediction Of Important Scenes

A novel method based on time-lag aware multi-modal variational autoencoder for prediction of important scenes (TI-MVAE-PIS) using baseball videos and tweets posted on Twitter is presented in this paper. This paper has the following two technical contributions. First, to effectively use heterogeneous...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings - International Conference on Image Processing s. 2678 - 2682
Hlavní autoři: Hirasawa, Kaito, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.01.2021
Témata:
ISSN:2381-8549
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:A novel method based on time-lag aware multi-modal variational autoencoder for prediction of important scenes (TI-MVAE-PIS) using baseball videos and tweets posted on Twitter is presented in this paper. This paper has the following two technical contributions. First, to effectively use heterogeneous data for the prediction of important scenes, we transform textual, visual and audio features obtained from tweets and videos to the latent features. Then TI-MVAE-PIS can flexibly express the relationships between them in the constructed latent space. Second, since there are time-lags between tweets and the corresponding multiple previous events, Tl-MVAE-PIS considers such time-lags in their relationship estimation for successfully deriving their latent features. Therefore, these two contributions enable accurate important scene prediction. Results of experiments using actual baseball videos and their corresponding tweets show the effectiveness of TI-MVAE-PIS.
ISSN:2381-8549
DOI:10.1109/ICIP42928.2021.9506496