Jointly Learning Visual Poses and Pose Lexicon for Semantic Action Recognition

A novel method for semantic action recognition through learning a pose lexicon is presented in this paper. A pose lexicon comprises a set of semantic poses, a set of visual poses, and a probabilistic mapping between the visual and semantic poses. This paper assumes that both the visual poses and map...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on circuits and systems for video technology Ročník 30; číslo 2; s. 457 - 467
Hlavní autoři:	Zhou, Lijuan, Li, Wanqing, Ogunbona, Philip, Zhang, Zhengyou
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.02.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithms Alignment Computational modeling Conditional probability Datasets Expectation-maximization algorithms Hidden Markov models Integrated circuit modeling Learning Mapping Markov chains pose lexicon Probabilistic logic Recognition Semantic action recognition Semantics Statistical analysis Visual observation visual pose Visualization
ISSN:	1051-8215, 1558-2205
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	A novel method for semantic action recognition through learning a pose lexicon is presented in this paper. A pose lexicon comprises a set of semantic poses, a set of visual poses, and a probabilistic mapping between the visual and semantic poses. This paper assumes that both the visual poses and mapping are hidden and proposes a method to simultaneously learn a visual pose model that estimates the likelihood of an observed video frame being generated from hidden visual poses, and a pose lexicon model establishes the probabilistic mapping between the hidden visual poses and the semantic poses parsed from textual instructions. Specifically, the proposed method consists of two-level hidden Markov models. One level represents the alignment between the visual poses and semantic poses. The other level represents a visual pose sequence, and each visual pose is modeled as a Gaussian mixture. An expectation-maximization algorithm is developed to train a pose lexicon. With the learned lexicon, action classification is formulated as a problem of finding the maximum posterior probability of a given sequence of video frames that follows a given sequence of semantic poses, constrained by the most likely visual pose and the alignment sequences. The proposed method was evaluated on MSRC-12, WorkoutSU-10, WorkoutUOW-18, Combined-15, Combined-17, and Combined-50 action datasets using cross-subject, cross-dataset, zero-shot, and seen/unseen protocols.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2019.2890829