Spatial-Temporal Context-Aware Online Action Detection and Prediction

Spatial-temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address action detection as an object detection problem, which utilizes successful object detection frameworks such as Faster R-CNN to operate actio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology Jg. 30; H. 8; S. 2650 - 2662
Hauptverfasser: Huang, Jingjia, Li, Nannan, Li, Thomas, Liu, Shan, Li, Ge
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York IEEE 01.08.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:1051-8215, 1558-2205
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Spatial-temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address action detection as an object detection problem, which utilizes successful object detection frameworks such as Faster R-CNN to operate action detection at every single frame first, and then generates action tubes by linking bounding boxes across the whole video in an offline fashion. However, unlike object detection in static images, temporal context information is vital for action detection in videos. Therefore, we propose an online action detection model that leverages the spatial-temporal context information existing in videos to perform action inference and localization. More specifically, we try to depict the spatial-temporal context pattern of actions via an encoder-decoder model that is based on a convolutional recurrent neural network. The model accepts a video snippet as input and encodes the dynamic information inside the snippet in the forward pass. During the backward pass, the decoder resolves the information for action detection with the current appearance or motion cue at each time stamp. In addition, we devise an incremental action-tube construction algorithm that enables our model to accomplish action prediction ahead of time and performs action detection in an online fashion. To evaluate the performance of our method, we conduct experiments on three popular public datasets UCF-101 , UCF-Sports , and J-HMDB-21 . The experimental results demonstrate that our method can achieve competitive or superior performance when compared to the state-of-the-art methods. To encourage further research, we release our project on " https://github.com.hjjpku.OATD ."
AbstractList Spatial-temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address action detection as an object detection problem, which utilizes successful object detection frameworks such as Faster R-CNN to operate action detection at every single frame first, and then generates action tubes by linking bounding boxes across the whole video in an offline fashion. However, unlike object detection in static images, temporal context information is vital for action detection in videos. Therefore, we propose an online action detection model that leverages the spatial-temporal context information existing in videos to perform action inference and localization. More specifically, we try to depict the spatial-temporal context pattern of actions via an encoder-decoder model that is based on a convolutional recurrent neural network. The model accepts a video snippet as input and encodes the dynamic information inside the snippet in the forward pass. During the backward pass, the decoder resolves the information for action detection with the current appearance or motion cue at each time stamp. In addition, we devise an incremental action-tube construction algorithm that enables our model to accomplish action prediction ahead of time and performs action detection in an online fashion. To evaluate the performance of our method, we conduct experiments on three popular public datasets UCF-101 , UCF-Sports , and J-HMDB-21 . The experimental results demonstrate that our method can achieve competitive or superior performance when compared to the state-of-the-art methods. To encourage further research, we release our project on " https://github.com.hjjpku.OATD ."
Author Liu, Shan
Li, Ge
Li, Thomas
Huang, Jingjia
Li, Nannan
Author_xml – sequence: 1
  givenname: Jingjia
  orcidid: 0000-0002-0834-3265
  surname: Huang
  fullname: Huang, Jingjia
  organization: School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
– sequence: 2
  givenname: Nannan
  orcidid: 0000-0002-8274-5123
  surname: Li
  fullname: Li, Nannan
  organization: School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
– sequence: 3
  givenname: Thomas
  surname: Li
  fullname: Li, Thomas
  organization: AIIT, Peking University, Hangzhou, China
– sequence: 4
  givenname: Shan
  surname: Liu
  fullname: Liu, Shan
  organization: Tencent Media Lab, Palo Alto, CA, USA
– sequence: 5
  givenname: Ge
  orcidid: 0000-0003-0140-0949
  surname: Li
  fullname: Li, Ge
  email: geli@ece.pku.edu.cn
  organization: School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China
BookMark eNp9kE9Lw0AQxRepYK1-Ab0EPKfu7G6SzbHE-gcKFRq9LpvdCaSkSdxsUb-9aVM8ePA0b2B-83jvkkyatkFCboDOAWh6n2eb93zOKKRzljKeADsjU4giGTJGo8mgaQShZBBdkMu-31IKQopkSpabTvtK12GOu651ug6ytvH45cPFp3YYrJu6ajBYGF-1TfCAHkelGxu8OrTVcb0i56Wue7w-zRl5e1zm2XO4Wj-9ZItVaHgMPiywNCkkUovCIqAUNraJiEXCkaZWcmMYlTamBYtZXEYcoJBCFsZYiRZZyWfkbvzbufZjj71X23bvmsFSMcEhoSwe0s8IG6-Ma_veYak6V-20-1ZA1aEudaxLHepSp7oGSP6BTOX1IZx3uqr_R29HtELEXy-ZCKCS8R_Ml3pW
CODEN ITCTEM
CitedBy_id crossref_primary_10_1016_j_neucom_2022_03_069
crossref_primary_10_3390_rs15174198
crossref_primary_10_1109_TCSVT_2022_3156058
crossref_primary_10_1109_TCSS_2024_3383270
crossref_primary_10_1109_TCSVT_2023_3326692
crossref_primary_10_1109_TCSVT_2022_3169842
crossref_primary_10_1109_TCSVT_2022_3149329
crossref_primary_10_1109_TNNLS_2025_3550910
crossref_primary_10_1109_JBHI_2021_3102612
crossref_primary_10_1109_TCSVT_2022_3232021
crossref_primary_10_1109_TCSVT_2023_3234307
crossref_primary_10_1016_j_image_2024_117224
crossref_primary_10_1007_s13042_021_01301_z
crossref_primary_10_1109_ACCESS_2024_3388532
crossref_primary_10_1016_j_neucom_2024_129246
crossref_primary_10_1109_TCSVT_2023_3321508
crossref_primary_10_1016_j_entcom_2025_101003
crossref_primary_10_1109_TMM_2021_3050069
crossref_primary_10_1109_TCSVT_2024_3387933
Cites_doi 10.1109/ICCV.2017.480
10.1109/ICCV.2015.510
10.1109/TCSVT.2015.2424054
10.5244/C.29.177
10.1109/ICCV.2017.472
10.1007/978-3-319-10578-9_48
10.1109/CVPR.2016.290
10.1109/CVPR.2014.223
10.1109/ICCV.2013.396
10.1109/TCSVT.2015.2502839
10.1109/CVPR.2016.213
10.1109/ICCV.2013.441
10.1007/978-3-319-46493-0_45
10.1109/ICCV.2017.619
10.1109/ICCV.2011.6126543
10.1162/neco.1997.9.8.1735
10.1109/CVPR.2014.100
10.1109/TMM.2018.2862341
10.1109/CVPR.2008.4587727
10.3115/v1/D14-1179
10.1109/CVPR.2015.7298878
10.1109/TCSVT.2018.2799968
10.1109/ICCV.2015.362
10.1109/TCSVT.2018.2830102
10.1109/CVPR.2016.219
10.1109/TPAMI.2015.2465955
10.5244/C.30.58
10.1109/TMM.2017.2749159
10.1109/CVPR.2014.81
10.1109/CVPR.2015.7298676
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCSVT.2019.2923712
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 2662
ExternalDocumentID 10_1109_TCSVT_2019_2923712
8741082
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61602014
  funderid: 10.13039/501100001809
– fundername: Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality
  grantid: ZDSYS201703031405467
– fundername: National Engineering Laboratory for Video Technology—Shenzhen Division, Shenzhen Municipal Science and Technology Program
  grantid: JCYJ20170818141146428
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c361t-befc9178a4bde1e84d6d746473e09d83cc208d60b2626f5311b848bccd8ede2f3
IEDL.DBID RIE
ISICitedReferencesCount 25
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000557386300028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1051-8215
IngestDate Sun Oct 05 00:07:02 EDT 2025
Sat Nov 29 08:15:51 EST 2025
Tue Nov 18 21:29:48 EST 2025
Wed Aug 27 01:50:59 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 8
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c361t-befc9178a4bde1e84d6d746473e09d83cc208d60b2626f5311b848bccd8ede2f3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-8274-5123
0000-0003-0140-0949
0000-0002-0834-3265
PQID 2431702692
PQPubID 85433
PageCount 13
ParticipantIDs proquest_journals_2431702692
crossref_primary_10_1109_TCSVT_2019_2923712
crossref_citationtrail_10_1109_TCSVT_2019_2923712
ieee_primary_8741082
PublicationCentury 2000
PublicationDate 2020-08-01
PublicationDateYYYYMMDD 2020-08-01
PublicationDate_xml – month: 08
  year: 2020
  text: 2020-08-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref13
ref34
ref12
ref15
ref36
ref14
singh (ref5) 2016
ref30
ref32
simonyan (ref31) 2014
ren (ref10) 2015
collobert (ref37) 2011
ref2
ref1
ref39
ng (ref17) 2015
ref16
ref19
ref18
behl (ref8) 2017
ref24
ref23
ref26
ref25
ref20
ref41
ref22
ref21
ref28
ref27
ref29
ref7
li (ref6) 2012
ref9
ref3
soomro (ref33) 2012
ref40
tieleman (ref38) 2012; 4
jain (ref4) 2007
ballas (ref11) 2015
References_xml – ident: ref32
  doi: 10.1109/ICCV.2017.480
– ident: ref24
  doi: 10.1109/ICCV.2015.510
– ident: ref1
  doi: 10.1109/TCSVT.2015.2424054
– volume: 4
  start-page: 26
  year: 2012
  ident: ref38
  article-title: Lecture 6.5-RMSprop: Divide the gradient by a running average of its recent magnitude
  publication-title: COURSERA Neural Netw Mach Learn
– ident: ref2
  doi: 10.5244/C.29.177
– start-page: 384
  year: 2012
  ident: ref6
  article-title: Searching action proposals via spatial actionness estimation and temporal path inference and tracking
  publication-title: Proc Asian Conf Comput Vis
– ident: ref9
  doi: 10.1109/ICCV.2017.472
– ident: ref39
  doi: 10.1007/978-3-319-10578-9_48
– ident: ref41
  doi: 10.1109/CVPR.2016.290
– start-page: 656
  year: 2007
  ident: ref4
  article-title: Supervoxel-consistent foreground propagation in video
  publication-title: Proc Eur Conf Comput Vis
– year: 2012
  ident: ref33
  article-title: UCF101: A dataset of 101 human actions classes from videos in the wild
  publication-title: arXiv 1212 0402
– year: 2017
  ident: ref8
  article-title: Incremental tube construction for human action detection
  publication-title: arXiv 1704 01358
– start-page: 4694
  year: 2015
  ident: ref17
  article-title: Beyond short snippets: Deep networks for video classification
  publication-title: Proc IEEE Conf Comput Vis Pattern Recognit
– year: 2015
  ident: ref11
  article-title: Delving deeper into convolutional networks for learning video representations
  publication-title: arXiv 1511 06432
– ident: ref20
  doi: 10.1109/CVPR.2014.223
– ident: ref35
  doi: 10.1109/ICCV.2013.396
– ident: ref19
  doi: 10.1109/TCSVT.2015.2502839
– year: 2014
  ident: ref31
  article-title: Very deep convolutional networks for large scale image recognition
  publication-title: arXiv 1409 1556
– ident: ref21
  doi: 10.1109/CVPR.2016.213
– ident: ref18
  doi: 10.1109/ICCV.2013.441
– ident: ref40
  doi: 10.1007/978-3-319-46493-0_45
– ident: ref29
  doi: 10.1109/ICCV.2017.619
– ident: ref36
  doi: 10.1109/ICCV.2011.6126543
– ident: ref12
  doi: 10.1162/neco.1997.9.8.1735
– start-page: 3637
  year: 2016
  ident: ref5
  article-title: Online real-time multiple spatiotemporal action localisation and prediction
  publication-title: Proc IEEE Int Conf Comput Vis
– ident: ref3
  doi: 10.1109/CVPR.2014.100
– year: 2011
  ident: ref37
  article-title: Torch7: A matlab-like environment for machine learning
  publication-title: Proc Adv Neural Inf Process Syst Workshop
– ident: ref26
  doi: 10.1109/TMM.2018.2862341
– ident: ref34
  doi: 10.1109/CVPR.2008.4587727
– ident: ref13
  doi: 10.3115/v1/D14-1179
– ident: ref25
  doi: 10.1109/CVPR.2015.7298878
– ident: ref30
  doi: 10.1109/TCSVT.2018.2799968
– ident: ref27
  doi: 10.1109/ICCV.2015.362
– start-page: 91
  year: 2015
  ident: ref10
  article-title: Faster R-CNN: Towards real-time object detection with region proposal networks
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref23
  doi: 10.1109/TCSVT.2018.2830102
– ident: ref15
  doi: 10.1109/CVPR.2016.219
– ident: ref16
  doi: 10.1109/TPAMI.2015.2465955
– ident: ref14
  doi: 10.5244/C.30.58
– ident: ref22
  doi: 10.1109/TMM.2017.2749159
– ident: ref28
  doi: 10.1109/CVPR.2014.81
– ident: ref7
  doi: 10.1109/CVPR.2015.7298676
SSID ssj0014847
Score 2.445826
Snippet Spatial-temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address...
Spatial–temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2650
SubjectTerms Algorithms
Coders
Computational modeling
Context
Context modeling
Electron tubes
encoder–decoder model
Object detection
Object recognition
online action tube generation
Performance evaluation
Predictive models
Proposals
Recurrent neural networks
Spatial–temporal action detection
Tubes
Videos
Title Spatial-Temporal Context-Aware Online Action Detection and Prediction
URI https://ieeexplore.ieee.org/document/8741082
https://www.proquest.com/docview/2431702692
Volume 30
WOSCitedRecordID wos000557386300028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2205
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014847
  issn: 1051-8215
  databaseCode: RIE
  dateStart: 19910101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH5sw4Me_DXF6ZQevGm3NqlNchy64WkMrLJbaZJXEMYmXaf_vknaFUURvKWQQHlJ-t7X9973AVwzoiTSHH0ZcWpJtbmfUc38mFIayztKci2d2ASbTvl8LmYtuG16YRDRFZ_hwA5dLl-v1Mb-Khty4_6My2pDm7G46tVqMgYRd2JiJlwIfW782LZBJhDD5P7pJbFVXGJATDzDQvLNCTlVlR-fYudfJgf_e7ND2K_jSG9UbfwRtHB5DHtf2AW7MLZyw-Z4-UlFP7XwHBWVQbqjj6xAr2IZ9Uaus8F7wBKrUbbU3qywCRz7eALPk3Fy_-jXqgm-onFY-hJzZTAYzyKpMUQe6VizKI4YxUBoTpUiAddxIInBMrm5gqHkEZdKaY4aSU5PobNcLfEMPJVhkKHIhCQGh6lQMp1nQlnKOMtDx3oQbs2YqppS3CpbLFIHLQKROtOn1vRpbfoe3DRr3ipCjT9nd62xm5m1nXvQ3-5WWt-5dUpsLGQgpSDnv6-6gF1i0bIr3-tDpyw2eAk76r18XRdX7jh9ArGwx0A
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED90CuqDX1Ocn33wTbu1SWyTx-EHE-cQrOJbaZIrCGOTbtN_3yTtiqIIvqWQQLkkvfv17n4_gNOYKIk0R18yTi2pNvczqmM_opRG8oKSXEsnNhEPBvzlRTwswHndC4OIrvgM23bocvl6rGb2V1mHG_dnXNYiLF0wRoKyW6vOGTDu5MRMwBD63HiyeYtMIDrJ5eNzYuu4RJuYiCYOyTc35HRVfnyMnYe52fjfu23CehVJet1y67dgAUfbsPaFX7AJ11Zw2BwwPykJqIaeI6MyWLf7kRXolTyjXtf1NnhXOMVylI2091DYFI593IGnm-vksudXugm-olE49SXmyqAwnjGpMUTOdKRjFrGYYiA0p0qRgOsokMSgmdxcwlByxqVSmqNGktNdaIzGI9wDT2UYZCgyIYlBYiqUsc4zoSxpnGWii1sQzs2YqopU3GpbDFMHLgKROtOn1vRpZfoWnNVr3kpKjT9nN62x65mVnVtwON-ttLp1k5TYaMiASkH2f191Aiu95L6f9m8HdwewSix2dsV8h9CYFjM8gmX1Pn2dFMfuaH0COkjKhw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Spatial%E2%80%93Temporal+Context-Aware+Online+Action+Detection+and+Prediction&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Huang%2C+Jingjia&rft.au=Li%2C+Nannan&rft.au=Li%2C+Thomas&rft.au=Liu%2C+Shan&rft.date=2020-08-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=30&rft.issue=8&rft.spage=2650&rft_id=info:doi/10.1109%2FTCSVT.2019.2923712&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon