Neighborhood Geometric Structure-Preserving Variational Autoencoder for Smooth and Bounded Data Sources

Many data sources, such as human poses, lie on low-dimensional manifolds that are smooth and bounded. Learning low-dimensional representations for such data is an important problem. One typical solution is to utilize encoder-decoder networks. However, due to the lack of effective regularization in l...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transaction on neural networks and learning systems Ročník 33; číslo 8; s. 3598 - 3611
Hlavní autoři: Chen, Xingyu, Wang, Chunyu, Lan, Xuguang, Zheng, Nanning, Zeng, Wenjun
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States IEEE 01.08.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:2162-237X, 2162-2388, 2162-2388
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Many data sources, such as human poses, lie on low-dimensional manifolds that are smooth and bounded. Learning low-dimensional representations for such data is an important problem. One typical solution is to utilize encoder-decoder networks. However, due to the lack of effective regularization in latent space, the learned representations usually do not preserve the essential data relations. For example, adjacent video frames in a sequence may be encoded into very different zones across the latent space with holes in between. This is problematic for many tasks such as denoising because slightly perturbed data have the risk of being encoded into very different latent variables, leaving output unpredictable. To resolve this problem, we first propose a neighborhood geometric structure-preserving variational autoencoder (SP-VAE), which not only maximizes the evidence lower bound but also encourages latent variables to preserve their structures as in ambient space. Then, we learn a set of small surfaces to approximately bound the learned manifold to deal with holes in latent space. We extensively validate the properties of our approach by reconstruction, denoising, and random image generation experiments on a number of data sources, including synthetic Swiss roll, human pose sequences, and facial expression images. The experimental results show that our approach learns more smooth manifolds than the baselines. We also apply our approach to the tasks of human pose refinement and facial expression image interpolation where it gets better results than the baselines.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2021.3053591