Sequential Variational Autoencoder with Adversarial Classifier for Video Disentanglement

In this paper, we propose a sequential variational autoencoder for video disentanglement, which is a representation learning method that can be used to separately extract static and dynamic features from videos. Building sequential variational autoencoders with a two-stream architecture induces indu...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Sensors (Basel, Switzerland) Ročník 23; číslo 5; s. 2515
Hlavní autoři:	Haga, Takeshi, Kera, Hiroshi, Kawamoto, Kazuhiko
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Switzerland MDPI AG 24.02.2023 MDPI
Témata:	adversarial training auxiliary adversarial classifier Bias Datasets Digital video Image processing inductive biases Machine learning Methods Neural networks Normal distribution Sensors sequential variational autoencoder Supervision Variables video disentanglement Japan adversarial training inductive biases sequential variational autoencoder video disentanglement auxiliary adversarial classifier
ISSN:	1424-8220, 1424-8220
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In this paper, we propose a sequential variational autoencoder for video disentanglement, which is a representation learning method that can be used to separately extract static and dynamic features from videos. Building sequential variational autoencoders with a two-stream architecture induces inductive bias for video disentanglement. However, our preliminary experiment demonstrated that the two-stream architecture is insufficient for video disentanglement because static features frequently contain dynamic features. Additionally, we found that dynamic features are not discriminative in the latent space. To address these problems, we introduced an adversarial classifier using supervised learning into the two-stream architecture. The strong inductive bias through supervision separates dynamic features from static features and yields discriminative representations of the dynamic features. Through a comparison with other sequential variational autoencoders, we qualitatively and quantitatively demonstrate the effectiveness of the proposed method on the Sprites and MUG datasets.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s23052515