Jointly Learning Visual Poses and Pose Lexicon for Semantic Action Recognition

A novel method for semantic action recognition through learning a pose lexicon is presented in this paper. A pose lexicon comprises a set of semantic poses, a set of visual poses, and a probabilistic mapping between the visual and semantic poses. This paper assumes that both the visual poses and map...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on circuits and systems for video technology Ročník 30; číslo 2; s. 457 - 467
Hlavní autori: Zhou, Lijuan, Li, Wanqing, Ogunbona, Philip, Zhang, Zhengyou
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York IEEE 01.02.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:1051-8215, 1558-2205
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract A novel method for semantic action recognition through learning a pose lexicon is presented in this paper. A pose lexicon comprises a set of semantic poses, a set of visual poses, and a probabilistic mapping between the visual and semantic poses. This paper assumes that both the visual poses and mapping are hidden and proposes a method to simultaneously learn a visual pose model that estimates the likelihood of an observed video frame being generated from hidden visual poses, and a pose lexicon model establishes the probabilistic mapping between the hidden visual poses and the semantic poses parsed from textual instructions. Specifically, the proposed method consists of two-level hidden Markov models. One level represents the alignment between the visual poses and semantic poses. The other level represents a visual pose sequence, and each visual pose is modeled as a Gaussian mixture. An expectation-maximization algorithm is developed to train a pose lexicon. With the learned lexicon, action classification is formulated as a problem of finding the maximum posterior probability of a given sequence of video frames that follows a given sequence of semantic poses, constrained by the most likely visual pose and the alignment sequences. The proposed method was evaluated on MSRC-12, WorkoutSU-10, WorkoutUOW-18, Combined-15, Combined-17, and Combined-50 action datasets using cross-subject, cross-dataset, zero-shot, and seen/unseen protocols.
AbstractList A novel method for semantic action recognition through learning a pose lexicon is presented in this paper. A pose lexicon comprises a set of semantic poses, a set of visual poses, and a probabilistic mapping between the visual and semantic poses. This paper assumes that both the visual poses and mapping are hidden and proposes a method to simultaneously learn a visual pose model that estimates the likelihood of an observed video frame being generated from hidden visual poses, and a pose lexicon model establishes the probabilistic mapping between the hidden visual poses and the semantic poses parsed from textual instructions. Specifically, the proposed method consists of two-level hidden Markov models. One level represents the alignment between the visual poses and semantic poses. The other level represents a visual pose sequence, and each visual pose is modeled as a Gaussian mixture. An expectation-maximization algorithm is developed to train a pose lexicon. With the learned lexicon, action classification is formulated as a problem of finding the maximum posterior probability of a given sequence of video frames that follows a given sequence of semantic poses, constrained by the most likely visual pose and the alignment sequences. The proposed method was evaluated on MSRC-12, WorkoutSU-10, WorkoutUOW-18, Combined-15, Combined-17, and Combined-50 action datasets using cross-subject, cross-dataset, zero-shot, and seen/unseen protocols.
Author Li, Wanqing
Ogunbona, Philip
Zhang, Zhengyou
Zhou, Lijuan
Author_xml – sequence: 1
  givenname: Lijuan
  orcidid: 0000-0002-6418-6284
  surname: Zhou
  fullname: Zhou, Lijuan
  email: lz683@uowmail.edu.au
  organization: AMRL, University of Wollongong, Wollongong, NSW, Australia
– sequence: 2
  givenname: Wanqing
  orcidid: 0000-0002-4427-2687
  surname: Li
  fullname: Li, Wanqing
  email: wanqing@uow.edu.au
  organization: AMRL, University of Wollongong, Wollongong, NSW, Australia
– sequence: 3
  givenname: Philip
  orcidid: 0000-0003-4119-2873
  surname: Ogunbona
  fullname: Ogunbona, Philip
  email: philipo@uow.edu.au
  organization: AMRL, University of Wollongong, Wollongong, NSW, Australia
– sequence: 4
  givenname: Zhengyou
  surname: Zhang
  fullname: Zhang, Zhengyou
  organization: Microsoft Research, Redmond, Shenzhen, WA, China
BookMark eNp9kE1PAjEQhhuDiYD-Ab1s4nmxn9v2SIifIWoEuTbdbpeUQIvtksi_dxeIBw-e5s3MPDNv3gHo-eAtANcIjhCC8m4-mS3mIwyRHGEhocDyDPQRYyLHGLJeqyFDucCIXYBBSisIERWU98HrS3C-We-zqdXRO7_MFi7t9Dp7D8mmTPvqoNrxtzPBZ3WI2cxutG-cycamcW3vw5qw9K7Tl-C81utkr051CD4f7ueTp3z69vg8GU9zgyVrckt4xXVdEiQLXlFpClZRAksrGNZScIp4zXhZ6hJaTSEt6q5nOedaYAoJGYLb491tDF87mxq1Crvo25cKE4aRRATKdksct0wMKUVbK-Ma3flsonZrhaDq0lOH9FSXnjql16L4D7qNbqPj_n_o5gg5a-0vIArYOhbkByf7fJA
CODEN ITCTEM
CitedBy_id crossref_primary_10_1016_j_eswa_2025_126646
crossref_primary_10_1016_j_imavis_2024_104985
crossref_primary_10_1109_TASE_2025_3553495
crossref_primary_10_3390_s20113305
crossref_primary_10_3390_electronics12204328
crossref_primary_10_1109_TCSVT_2024_3434563
crossref_primary_10_1109_TCSVT_2021_3050807
crossref_primary_10_1016_j_eswa_2025_127420
Cites_doi 10.1007/978-3-319-10578-9_27
10.1109/ICPR.2014.451
10.3115/993268.993313
10.1109/CVPR.2017.55
10.1145/2733373.2806296
10.1109/DICTA.2014.7008101
10.1016/j.patcog.2017.06.035
10.1109/DICTA.2014.7008115
10.1007/s11263-016-0897-2
10.1109/TIFS.2013.2258152
10.1023/A:1007617005950
10.1109/TCSVT.2016.2628339
10.1109/THMS.2015.2504550
10.1109/ICIP.2015.7350781
10.1007/s11263-007-0122-4
10.1145/2207676.2208303
10.1109/SIU.2013.6531398
10.1109/CVPR.2013.123
10.1109/ICCV.2013.337
10.1109/CVPR.2011.5995353
10.18653/v1/D17-1052
10.1109/TCSVT.2008.2005597
10.1109/CVPR.2008.4587756
10.1109/FG.2017.100
10.1109/5.18626
10.1109/CVPRW.2012.6239233
10.1109/ICME.2016.7552882
10.1109/ICCV.2013.274
10.1007/978-3-642-33718-5_11
10.3115/v1/N15-1017
10.1109/CVPRW.2010.5543273
10.1007/978-3-030-01234-2_9
10.1109/TPAMI.2009.43
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCSVT.2019.2890829
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 467
ExternalDocumentID 10_1109_TCSVT_2019_2890829
8600338
Genre orig-research
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c295t-e37d7afb31967d49c65d430be852a987417f57bbab0ea4046f8741e777a824033
IEDL.DBID RIE
ISICitedReferencesCount 16
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000521643900013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1051-8215
IngestDate Mon Jun 30 03:03:01 EDT 2025
Tue Nov 18 21:31:34 EST 2025
Sat Nov 29 01:44:12 EST 2025
Wed Aug 27 06:28:51 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-e37d7afb31967d49c65d430be852a987417f57bbab0ea4046f8741e777a824033
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-6418-6284
0000-0002-4427-2687
0000-0003-4119-2873
PQID 2352191309
PQPubID 85433
PageCount 11
ParticipantIDs crossref_citationtrail_10_1109_TCSVT_2019_2890829
crossref_primary_10_1109_TCSVT_2019_2890829
proquest_journals_2352191309
ieee_primary_8600338
PublicationCentury 2000
PublicationDate 2020-02-01
PublicationDateYYYYMMDD 2020-02-01
PublicationDate_xml – month: 02
  year: 2020
  text: 2020-02-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References tang (ref38) 2015
ref35
ref13
ref12
ref37
ref15
ref36
ref14
li (ref1) 2008; 18
ref31
ref30
brown (ref26) 1993; 19
ref33
ref11
dyer (ref28) 2011
ref32
ref10
ref2
ref39
ref17
ref16
ref18
ref24
ref23
ref25
naim (ref29) 2014
ref20
ref22
ref21
ref27
eweiwi (ref9) 2014; 9007
ref8
ref7
hussein (ref34) 2013
ref4
ref3
ref6
ref5
blei (ref19) 2003; 3
ref40
References_xml – start-page: 2466
  year: 2013
  ident: ref34
  article-title: Human action recognition using a temporal hierarchy of covariance descriptor on 3D joint locations
  publication-title: Proc Intern Joint Conf Artificial Intel (IJCAI)
– year: 2015
  ident: ref38
  publication-title: Online action recognition based on incremental learning of weighted covariance descriptors
– ident: ref35
  doi: 10.1007/978-3-319-10578-9_27
– ident: ref17
  doi: 10.1109/ICPR.2014.451
– ident: ref27
  doi: 10.3115/993268.993313
– ident: ref10
  doi: 10.1109/CVPR.2017.55
– start-page: 1
  year: 2014
  ident: ref29
  article-title: Unsupervised alignment of natural language instructions with video segments
  publication-title: Proc AAAI Conf Artif Intell (AAAI)
– ident: ref3
  doi: 10.1145/2733373.2806296
– ident: ref14
  doi: 10.1109/DICTA.2014.7008101
– ident: ref12
  doi: 10.1016/j.patcog.2017.06.035
– ident: ref2
  doi: 10.1109/DICTA.2014.7008115
– ident: ref24
  doi: 10.1007/s11263-016-0897-2
– ident: ref21
  doi: 10.1109/TIFS.2013.2258152
– ident: ref18
  doi: 10.1023/A:1007617005950
– ident: ref5
  doi: 10.1109/TCSVT.2016.2628339
– ident: ref4
  doi: 10.1109/THMS.2015.2504550
– ident: ref36
  doi: 10.1109/ICIP.2015.7350781
– ident: ref15
  doi: 10.1007/s11263-007-0122-4
– ident: ref32
  doi: 10.1145/2207676.2208303
– ident: ref33
  doi: 10.1109/SIU.2013.6531398
– ident: ref8
  doi: 10.1109/CVPR.2013.123
– ident: ref25
  doi: 10.1109/ICCV.2013.337
– ident: ref20
  doi: 10.1109/CVPR.2011.5995353
– ident: ref31
  doi: 10.18653/v1/D17-1052
– volume: 18
  start-page: 1499
  year: 2008
  ident: ref1
  article-title: Expandable data-driven graphical modeling of human actions based on salient postures
  publication-title: IEEE Trans Circuits Syst Video Technol
  doi: 10.1109/TCSVT.2008.2005597
– ident: ref13
  doi: 10.1109/CVPR.2008.4587756
– ident: ref23
  doi: 10.1109/FG.2017.100
– volume: 19
  start-page: 263
  year: 1993
  ident: ref26
  article-title: The mathematics of statistical machine translation: Parameter estimation
  publication-title: Comput Linguistics
– volume: 9007
  start-page: 428
  year: 2014
  ident: ref9
  article-title: Efficient pose-based action recognition
  publication-title: Vision Computer
– ident: ref39
  doi: 10.1109/5.18626
– ident: ref37
  doi: 10.1109/CVPRW.2012.6239233
– ident: ref11
  doi: 10.1109/ICME.2016.7552882
– ident: ref40
  doi: 10.1109/ICCV.2013.274
– volume: 3
  start-page: 993
  year: 2003
  ident: ref19
  article-title: Latent Dirichlet allocation
  publication-title: J Mach Learn Res
– ident: ref22
  doi: 10.1007/978-3-642-33718-5_11
– start-page: 409
  year: 2011
  ident: ref28
  article-title: Unsupervised word alignment with arbitrary features
  publication-title: Proc 49th Annual Meeting Assoc for Comp Linguistics Human Language Tech
– ident: ref30
  doi: 10.3115/v1/N15-1017
– ident: ref7
  doi: 10.1109/CVPRW.2010.5543273
– ident: ref6
  doi: 10.1007/978-3-030-01234-2_9
– ident: ref16
  doi: 10.1109/TPAMI.2009.43
SSID ssj0014847
Score 2.3930137
Snippet A novel method for semantic action recognition through learning a pose lexicon is presented in this paper. A pose lexicon comprises a set of semantic poses, a...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 457
SubjectTerms Algorithms
Alignment
Computational modeling
Conditional probability
Datasets
Expectation-maximization algorithms
Hidden Markov models
Integrated circuit modeling
Learning
Mapping
Markov chains
pose lexicon
Probabilistic logic
Recognition
Semantic action recognition
Semantics
Statistical analysis
Visual observation
visual pose
Visualization
Title Jointly Learning Visual Poses and Pose Lexicon for Semantic Action Recognition
URI https://ieeexplore.ieee.org/document/8600338
https://www.proquest.com/docview/2352191309
Volume 30
WOSCitedRecordID wos000521643900013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2205
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014847
  issn: 1051-8215
  databaseCode: RIE
  dateStart: 19910101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4A8aAHX2hE0ezBmxbK9rG7R0IkxgMhgoRbs68aEmwNBaP_3t2loInGxFvT7iTNzG7nm87jA7iWIgy01qknY628UBPlcYGFF4uYW48rsPId2QQZDOh0yoYVuN32whhBV3ymW_bS5fJVLlf2V1mbxpZ6jFahSki87tXaZgxC6sjEDFzoeNT4sU2DjM_a495oMrZVXKxl02rUwckvJ-RYVX58ip1_6R_8780OYb_Ekai7NvwRVHR2DHvfpgvWYfCQz7Ll_AOVM1Sf0WRWrIzQMC90gXim3JV5_G62Q4YMfEUj_WJUPZOo6_od0OOmvijPTuCpfzfu3XslfYInMYuWng6IIjwV9pARFTIZRyoMfKFphDmjBkqQNCJCcOFrHpo4ObX3NCGEUzulLziFWpZn-gxQygi2OWplPHrIQyxoKiNORYBlygPmN6Cz0Wciy9niluJinrgYw2eJs0FibZCUNmjAzVbmdT1Z48_Vdav17cpS4Q1obsyWlIevSLABlSYMDXx2_rvUBexiGza74usm1JaLlb6EHfm2nBWLK7evPgEIrson
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1ZS8NAEB7qAeqDVxXruQ--adq4OXb3sRRLvUrRWvoW9ooUNBHTiv57d9e0CorgW0h2IMzsZr7JHB_AsRRhoLVOPRlr5YWaKI8LLLxYxNx6XIGV78gmSLdLh0PWq8DprBfGCLriM123ly6Xr3I5sb_KGjS21GN0DhYsc1bZrTXLGYTU0YkZwHDmUePJpi0yPmv0W3eDvq3jYnWbWKMOUH65Icer8uNj7DxMe-1_77YOqyWSRM1P029ARWebsPJtvmAVupf5KBs_vqNyiuoDGoyKiRHq5YUuEM-UuzKP38yGyJABsOhOPxlljyRquo4HdDutMMqzLbhvn_dbHa8kUPAkZtHY0wFRhKfCHjOiQibjSIWBLzSNMGfUgAmSRkQILnzNQxMpp_aeJoRwauf0Bdswn-WZ3gGUMoJtlloZnx7yEAuayohTEWCZ8oD5NTib6jOR5XRxS3LxmLgow2eJs0FibZCUNqjByUzm-XO2xp-rq1brs5WlwmuwPzVbUh6_IsEGVppANPDZ7u9SR7DU6d9cJ9cX3as9WMY2iHal2PswP36Z6ANYlK_jUfFy6PbYB76FzXA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Jointly+Learning+Visual+Poses+and+Pose+Lexicon+for+Semantic+Action+Recognition&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Zhou%2C+Lijuan&rft.au=Li%2C+Wanqing&rft.au=Ogunbona%2C+Philip&rft.au=Zhang%2C+Zhengyou&rft.date=2020-02-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=30&rft.issue=2&rft.spage=457&rft_id=info:doi/10.1109%2FTCSVT.2019.2890829&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon