Spatial-Temporal Context-Aware Online Action Detection and Prediction
Spatial-temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address action detection as an object detection problem, which utilizes successful object detection frameworks such as Faster R-CNN to operate actio...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on circuits and systems for video technology Jg. 30; H. 8; S. 2650 - 2662 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
01.08.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 1051-8215, 1558-2205 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Spatial-temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address action detection as an object detection problem, which utilizes successful object detection frameworks such as Faster R-CNN to operate action detection at every single frame first, and then generates action tubes by linking bounding boxes across the whole video in an offline fashion. However, unlike object detection in static images, temporal context information is vital for action detection in videos. Therefore, we propose an online action detection model that leverages the spatial-temporal context information existing in videos to perform action inference and localization. More specifically, we try to depict the spatial-temporal context pattern of actions via an encoder-decoder model that is based on a convolutional recurrent neural network. The model accepts a video snippet as input and encodes the dynamic information inside the snippet in the forward pass. During the backward pass, the decoder resolves the information for action detection with the current appearance or motion cue at each time stamp. In addition, we devise an incremental action-tube construction algorithm that enables our model to accomplish action prediction ahead of time and performs action detection in an online fashion. To evaluate the performance of our method, we conduct experiments on three popular public datasets UCF-101 , UCF-Sports , and J-HMDB-21 . The experimental results demonstrate that our method can achieve competitive or superior performance when compared to the state-of-the-art methods. To encourage further research, we release our project on " https://github.com.hjjpku.OATD ." |
|---|---|
| AbstractList | Spatial-temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address action detection as an object detection problem, which utilizes successful object detection frameworks such as Faster R-CNN to operate action detection at every single frame first, and then generates action tubes by linking bounding boxes across the whole video in an offline fashion. However, unlike object detection in static images, temporal context information is vital for action detection in videos. Therefore, we propose an online action detection model that leverages the spatial-temporal context information existing in videos to perform action inference and localization. More specifically, we try to depict the spatial-temporal context pattern of actions via an encoder-decoder model that is based on a convolutional recurrent neural network. The model accepts a video snippet as input and encodes the dynamic information inside the snippet in the forward pass. During the backward pass, the decoder resolves the information for action detection with the current appearance or motion cue at each time stamp. In addition, we devise an incremental action-tube construction algorithm that enables our model to accomplish action prediction ahead of time and performs action detection in an online fashion. To evaluate the performance of our method, we conduct experiments on three popular public datasets UCF-101 , UCF-Sports , and J-HMDB-21 . The experimental results demonstrate that our method can achieve competitive or superior performance when compared to the state-of-the-art methods. To encourage further research, we release our project on " https://github.com.hjjpku.OATD ." |
| Author | Liu, Shan Li, Ge Li, Thomas Huang, Jingjia Li, Nannan |
| Author_xml | – sequence: 1 givenname: Jingjia orcidid: 0000-0002-0834-3265 surname: Huang fullname: Huang, Jingjia organization: School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China – sequence: 2 givenname: Nannan orcidid: 0000-0002-8274-5123 surname: Li fullname: Li, Nannan organization: School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China – sequence: 3 givenname: Thomas surname: Li fullname: Li, Thomas organization: AIIT, Peking University, Hangzhou, China – sequence: 4 givenname: Shan surname: Liu fullname: Liu, Shan organization: Tencent Media Lab, Palo Alto, CA, USA – sequence: 5 givenname: Ge orcidid: 0000-0003-0140-0949 surname: Li fullname: Li, Ge email: geli@ece.pku.edu.cn organization: School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China |
| BookMark | eNp9kE9Lw0AQxRepYK1-Ab0EPKfu7G6SzbHE-gcKFRq9LpvdCaSkSdxsUb-9aVM8ePA0b2B-83jvkkyatkFCboDOAWh6n2eb93zOKKRzljKeADsjU4giGTJGo8mgaQShZBBdkMu-31IKQopkSpabTvtK12GOu651ug6ytvH45cPFp3YYrJu6ajBYGF-1TfCAHkelGxu8OrTVcb0i56Wue7w-zRl5e1zm2XO4Wj-9ZItVaHgMPiywNCkkUovCIqAUNraJiEXCkaZWcmMYlTamBYtZXEYcoJBCFsZYiRZZyWfkbvzbufZjj71X23bvmsFSMcEhoSwe0s8IG6-Ma_veYak6V-20-1ZA1aEudaxLHepSp7oGSP6BTOX1IZx3uqr_R29HtELEXy-ZCKCS8R_Ml3pW |
| CODEN | ITCTEM |
| CitedBy_id | crossref_primary_10_1016_j_neucom_2022_03_069 crossref_primary_10_3390_rs15174198 crossref_primary_10_1109_TCSVT_2022_3156058 crossref_primary_10_1109_TCSS_2024_3383270 crossref_primary_10_1109_TCSVT_2023_3326692 crossref_primary_10_1109_TCSVT_2022_3169842 crossref_primary_10_1109_TCSVT_2022_3149329 crossref_primary_10_1109_TNNLS_2025_3550910 crossref_primary_10_1109_JBHI_2021_3102612 crossref_primary_10_1109_TCSVT_2022_3232021 crossref_primary_10_1109_TCSVT_2023_3234307 crossref_primary_10_1016_j_image_2024_117224 crossref_primary_10_1007_s13042_021_01301_z crossref_primary_10_1109_ACCESS_2024_3388532 crossref_primary_10_1016_j_neucom_2024_129246 crossref_primary_10_1109_TCSVT_2023_3321508 crossref_primary_10_1016_j_entcom_2025_101003 crossref_primary_10_1109_TMM_2021_3050069 crossref_primary_10_1109_TCSVT_2024_3387933 |
| Cites_doi | 10.1109/ICCV.2017.480 10.1109/ICCV.2015.510 10.1109/TCSVT.2015.2424054 10.5244/C.29.177 10.1109/ICCV.2017.472 10.1007/978-3-319-10578-9_48 10.1109/CVPR.2016.290 10.1109/CVPR.2014.223 10.1109/ICCV.2013.396 10.1109/TCSVT.2015.2502839 10.1109/CVPR.2016.213 10.1109/ICCV.2013.441 10.1007/978-3-319-46493-0_45 10.1109/ICCV.2017.619 10.1109/ICCV.2011.6126543 10.1162/neco.1997.9.8.1735 10.1109/CVPR.2014.100 10.1109/TMM.2018.2862341 10.1109/CVPR.2008.4587727 10.3115/v1/D14-1179 10.1109/CVPR.2015.7298878 10.1109/TCSVT.2018.2799968 10.1109/ICCV.2015.362 10.1109/TCSVT.2018.2830102 10.1109/CVPR.2016.219 10.1109/TPAMI.2015.2465955 10.5244/C.30.58 10.1109/TMM.2017.2749159 10.1109/CVPR.2014.81 10.1109/CVPR.2015.7298676 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TCSVT.2019.2923712 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-2205 |
| EndPage | 2662 |
| ExternalDocumentID | 10_1109_TCSVT_2019_2923712 8741082 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61602014 funderid: 10.13039/501100001809 – fundername: Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality grantid: ZDSYS201703031405467 – fundername: National Engineering Laboratory for Video Technology—Shenzhen Division, Shenzhen Municipal Science and Technology Program grantid: JCYJ20170818141146428 |
| GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c361t-befc9178a4bde1e84d6d746473e09d83cc208d60b2626f5311b848bccd8ede2f3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 25 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000557386300028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1051-8215 |
| IngestDate | Sun Oct 05 00:07:02 EDT 2025 Sat Nov 29 08:15:51 EST 2025 Tue Nov 18 21:29:48 EST 2025 Wed Aug 27 01:50:59 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 8 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c361t-befc9178a4bde1e84d6d746473e09d83cc208d60b2626f5311b848bccd8ede2f3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-8274-5123 0000-0003-0140-0949 0000-0002-0834-3265 |
| PQID | 2431702692 |
| PQPubID | 85433 |
| PageCount | 13 |
| ParticipantIDs | proquest_journals_2431702692 crossref_primary_10_1109_TCSVT_2019_2923712 crossref_citationtrail_10_1109_TCSVT_2019_2923712 ieee_primary_8741082 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-08-01 |
| PublicationDateYYYYMMDD | 2020-08-01 |
| PublicationDate_xml | – month: 08 year: 2020 text: 2020-08-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on circuits and systems for video technology |
| PublicationTitleAbbrev | TCSVT |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 ref13 ref34 ref12 ref15 ref36 ref14 singh (ref5) 2016 ref30 ref32 simonyan (ref31) 2014 ren (ref10) 2015 collobert (ref37) 2011 ref2 ref1 ref39 ng (ref17) 2015 ref16 ref19 ref18 behl (ref8) 2017 ref24 ref23 ref26 ref25 ref20 ref41 ref22 ref21 ref28 ref27 ref29 ref7 li (ref6) 2012 ref9 ref3 soomro (ref33) 2012 ref40 tieleman (ref38) 2012; 4 jain (ref4) 2007 ballas (ref11) 2015 |
| References_xml | – ident: ref32 doi: 10.1109/ICCV.2017.480 – ident: ref24 doi: 10.1109/ICCV.2015.510 – ident: ref1 doi: 10.1109/TCSVT.2015.2424054 – volume: 4 start-page: 26 year: 2012 ident: ref38 article-title: Lecture 6.5-RMSprop: Divide the gradient by a running average of its recent magnitude publication-title: COURSERA Neural Netw Mach Learn – ident: ref2 doi: 10.5244/C.29.177 – start-page: 384 year: 2012 ident: ref6 article-title: Searching action proposals via spatial actionness estimation and temporal path inference and tracking publication-title: Proc Asian Conf Comput Vis – ident: ref9 doi: 10.1109/ICCV.2017.472 – ident: ref39 doi: 10.1007/978-3-319-10578-9_48 – ident: ref41 doi: 10.1109/CVPR.2016.290 – start-page: 656 year: 2007 ident: ref4 article-title: Supervoxel-consistent foreground propagation in video publication-title: Proc Eur Conf Comput Vis – year: 2012 ident: ref33 article-title: UCF101: A dataset of 101 human actions classes from videos in the wild publication-title: arXiv 1212 0402 – year: 2017 ident: ref8 article-title: Incremental tube construction for human action detection publication-title: arXiv 1704 01358 – start-page: 4694 year: 2015 ident: ref17 article-title: Beyond short snippets: Deep networks for video classification publication-title: Proc IEEE Conf Comput Vis Pattern Recognit – year: 2015 ident: ref11 article-title: Delving deeper into convolutional networks for learning video representations publication-title: arXiv 1511 06432 – ident: ref20 doi: 10.1109/CVPR.2014.223 – ident: ref35 doi: 10.1109/ICCV.2013.396 – ident: ref19 doi: 10.1109/TCSVT.2015.2502839 – year: 2014 ident: ref31 article-title: Very deep convolutional networks for large scale image recognition publication-title: arXiv 1409 1556 – ident: ref21 doi: 10.1109/CVPR.2016.213 – ident: ref18 doi: 10.1109/ICCV.2013.441 – ident: ref40 doi: 10.1007/978-3-319-46493-0_45 – ident: ref29 doi: 10.1109/ICCV.2017.619 – ident: ref36 doi: 10.1109/ICCV.2011.6126543 – ident: ref12 doi: 10.1162/neco.1997.9.8.1735 – start-page: 3637 year: 2016 ident: ref5 article-title: Online real-time multiple spatiotemporal action localisation and prediction publication-title: Proc IEEE Int Conf Comput Vis – ident: ref3 doi: 10.1109/CVPR.2014.100 – year: 2011 ident: ref37 article-title: Torch7: A matlab-like environment for machine learning publication-title: Proc Adv Neural Inf Process Syst Workshop – ident: ref26 doi: 10.1109/TMM.2018.2862341 – ident: ref34 doi: 10.1109/CVPR.2008.4587727 – ident: ref13 doi: 10.3115/v1/D14-1179 – ident: ref25 doi: 10.1109/CVPR.2015.7298878 – ident: ref30 doi: 10.1109/TCSVT.2018.2799968 – ident: ref27 doi: 10.1109/ICCV.2015.362 – start-page: 91 year: 2015 ident: ref10 article-title: Faster R-CNN: Towards real-time object detection with region proposal networks publication-title: Proc Adv Neural Inf Process Syst – ident: ref23 doi: 10.1109/TCSVT.2018.2830102 – ident: ref15 doi: 10.1109/CVPR.2016.219 – ident: ref16 doi: 10.1109/TPAMI.2015.2465955 – ident: ref14 doi: 10.5244/C.30.58 – ident: ref22 doi: 10.1109/TMM.2017.2749159 – ident: ref28 doi: 10.1109/CVPR.2014.81 – ident: ref7 doi: 10.1109/CVPR.2015.7298676 |
| SSID | ssj0014847 |
| Score | 2.445826 |
| Snippet | Spatial-temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address... Spatial–temporal action detection in videos is a challenging problem that has attracted considerable attention in recent years. Most current approaches address... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 2650 |
| SubjectTerms | Algorithms Coders Computational modeling Context Context modeling Electron tubes encoder–decoder model Object detection Object recognition online action tube generation Performance evaluation Predictive models Proposals Recurrent neural networks Spatial–temporal action detection Tubes Videos |
| Title | Spatial-Temporal Context-Aware Online Action Detection and Prediction |
| URI | https://ieeexplore.ieee.org/document/8741082 https://www.proquest.com/docview/2431702692 |
| Volume | 30 |
| WOSCitedRecordID | wos000557386300028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2205 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014847 issn: 1051-8215 databaseCode: RIE dateStart: 19910101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH5sw4Me_DXF6ZQevGm3NqlNchy64WkMrLJbaZJXEMYmXaf_vknaFUURvKWQQHlJ-t7X9973AVwzoiTSHH0ZcWpJtbmfUc38mFIayztKci2d2ASbTvl8LmYtuG16YRDRFZ_hwA5dLl-v1Mb-Khty4_6My2pDm7G46tVqMgYRd2JiJlwIfW782LZBJhDD5P7pJbFVXGJATDzDQvLNCTlVlR-fYudfJgf_e7ND2K_jSG9UbfwRtHB5DHtf2AW7MLZyw-Z4-UlFP7XwHBWVQbqjj6xAr2IZ9Uaus8F7wBKrUbbU3qywCRz7eALPk3Fy_-jXqgm-onFY-hJzZTAYzyKpMUQe6VizKI4YxUBoTpUiAddxIInBMrm5gqHkEZdKaY4aSU5PobNcLfEMPJVhkKHIhCQGh6lQMp1nQlnKOMtDx3oQbs2YqppS3CpbLFIHLQKROtOn1vRpbfoe3DRr3ipCjT9nd62xm5m1nXvQ3-5WWt-5dUpsLGQgpSDnv6-6gF1i0bIr3-tDpyw2eAk76r18XRdX7jh9ArGwx0A |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED90CuqDX1Ocn33wTbu1SWyTx-EHE-cQrOJbaZIrCGOTbtN_3yTtiqIIvqWQQLkkvfv17n4_gNOYKIk0R18yTi2pNvczqmM_opRG8oKSXEsnNhEPBvzlRTwswHndC4OIrvgM23bocvl6rGb2V1mHG_dnXNYiLF0wRoKyW6vOGTDu5MRMwBD63HiyeYtMIDrJ5eNzYuu4RJuYiCYOyTc35HRVfnyMnYe52fjfu23CehVJet1y67dgAUfbsPaFX7AJ11Zw2BwwPykJqIaeI6MyWLf7kRXolTyjXtf1NnhXOMVylI2091DYFI593IGnm-vksudXugm-olE49SXmyqAwnjGpMUTOdKRjFrGYYiA0p0qRgOsokMSgmdxcwlByxqVSmqNGktNdaIzGI9wDT2UYZCgyIYlBYiqUsc4zoSxpnGWii1sQzs2YqopU3GpbDFMHLgKROtOn1vRpZfoWnNVr3kpKjT9nN62x65mVnVtwON-ttLp1k5TYaMiASkH2f191Aiu95L6f9m8HdwewSix2dsV8h9CYFjM8gmX1Pn2dFMfuaH0COkjKhw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Spatial%E2%80%93Temporal+Context-Aware+Online+Action+Detection+and+Prediction&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Huang%2C+Jingjia&rft.au=Li%2C+Nannan&rft.au=Li%2C+Thomas&rft.au=Liu%2C+Shan&rft.date=2020-08-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=30&rft.issue=8&rft.spage=2650&rft_id=info:doi/10.1109%2FTCSVT.2019.2923712&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon |