Stochastic human motion prediction using a quantized conditional diffusion model

Human motion prediction is a fundamental task in computer vision, aiming to forecast future human poses based on observed motion sequences. Existing deterministic methods generate a single future motion sequence, neglecting the inherent stochasticity and diversity of human behaviors. To address this...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Knowledge-based systems Ročník 309; s. 112823
Hlavní autori:	Huang, Biaozhang, Li, Xinde, Hu, Chuanfei, Li, Heqing
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier B.V 30.01.2025
Predmet:	Conditional diffusion model Human motion prediction Vector quantized variational autoencoder Conditional diffusion model Human motion prediction Vector quantized variational autoencoder
ISSN:	0950-7051
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Human motion prediction is a fundamental task in computer vision, aiming to forecast future human poses based on observed motion sequences. Existing deterministic methods generate a single future motion sequence, neglecting the inherent stochasticity and diversity of human behaviors. To address this limitation, we propose a novel two-stage stochastic human motion prediction framework, termed the Quantized Conditional Diffusion Model (QCDM), which combines a Discrete Motion Quantization Module and a Conditional Motion Generation Module. Specifically, we first design a discrete motion quantization module that leverages Graph Convolutional Networks (GCNs) and one-dimensional temporal convolutions to encode motion sequences into continuous latent representations. These representations are then quantized into discrete latent variables using a learnable codebook. A decoder reconstructs the motion sequence from these discrete variables, preserving key motion patterns while eliminating redundancies. Next, we develop a conditional motion generation module that integrates GCNs and Transformers for denoising spatio-temporal features. The diffusion process iteratively refines noisy motion data by reversing a gradual noising procedure, modeling the distribution of plausible future motions. Action category information and observed historical motion segments are incorporated as conditions into the denoising process, enabling controllable generation of specific motions. Additionally, we introduce a diversity enhancement strategy by penalizing overly similar samples. This encourages the model to explore a wider range of plausible motions and thereby improving the diversity and richness of the prediction results. Extensive experiments demonstrate that the QCDM framework outperforms state-of-the-art methods in stochastic human motion prediction tasks, offering both accuracy and diversity in generated motion sequences. •Combines motion quantization with conditional diffusion model for motion prediction.•Utilizes GCNs and temporal convolutions for efficient motion feature extraction.•Integrates action category info for controllable, diverse motion generation.•Implements diversity enhancement to reduce similarity in prediction samples.
ISSN:	0950-7051
DOI:	10.1016/j.knosys.2024.112823