UntrimmedNets for Weakly Supervised Action Recognition and Detection
Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action reco...
Saved in:
| Published in: | 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 6402 - 6411 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.07.2017
|
| Subjects: | |
| ISSN: | 1063-6919, 1063-6919 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances. Our UntrimmedNet couples two important components, the classification module and the selection module, to learn the action models and reason about the temporal duration of action instances, respectively. These two components are implemented with feed-forward networks, and UntrimmedNet is therefore an end-to-end trainable architecture. We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs weak supervision, our method achieves performance superior or comparable to that of those strongly supervised approaches on these two datasets. |
|---|---|
| AbstractList | Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances. Our UntrimmedNet couples two important components, the classification module and the selection module, to learn the action models and reason about the temporal duration of action instances, respectively. These two components are implemented with feed-forward networks, and UntrimmedNet is therefore an end-to-end trainable architecture. We exploit the learned models for action recognition (WSR) and detection (WSD) on the untrimmed video datasets of THUMOS14 and ActivityNet. Although our UntrimmedNet only employs weak supervision, our method achieves performance superior or comparable to that of those strongly supervised approaches on these two datasets. |
| Author | Xiong, Yuanjun Van Gool, Luc Lin, Dahua Wang, Limin |
| Author_xml | – sequence: 1 givenname: Limin surname: Wang fullname: Wang, Limin organization: Comput. Vision Lab., ETH Zurich, Zurich, Switzerland – sequence: 2 givenname: Yuanjun surname: Xiong fullname: Xiong, Yuanjun organization: Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Hong Kong, China – sequence: 3 givenname: Dahua surname: Lin fullname: Lin, Dahua organization: Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Hong Kong, China – sequence: 4 givenname: Luc surname: Van Gool fullname: Van Gool, Luc organization: Computer Vision Laboratory, ETH Zurich, Switzerland |
| BookMark | eNpNjk9LwzAYh6NMcJ0ePXnJF2h936RJmuPo_AdDZTo9jqZ5I9GtHW0V9u0d04On3wM_eHgSNmrahhi7QMgQwV6Vr0-LTACaTJviiCWoZKEhVyY_ZmMELVNt0Y7-8SlL-v4DQEgjYMxmy2bo4mZD_oGGnoe2429Ufa53_PlrS9137MnzaT3EtuELqtv3Jh64ajyf0UCH54ydhGrd0_nfTtjy5vqlvEvnj7f35XSeRjRqSG3htUYjcyyspNqpICw4S6BqEs6boHxuHFK-LzN55ayRwQtn6hCcBS3khF3-eiMRrbb77qrbrQoEQI3yB95UTXE |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR.2017.678 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences Computer Science |
| EISBN | 1538604574 9781538604571 |
| EISSN | 1063-6919 |
| EndPage | 6411 |
| ExternalDocumentID | 8100161 |
| Genre | orig-research |
| GroupedDBID | 23M 29F 29O 6IE 6IH 6IK ABDPE ACGFS ALMA_UNASSIGNED_HOLDINGS CBEJK IPLJI M43 RIE RIO RNS |
| ID | FETCH-LOGICAL-i175t-98d6617341893ecb5f290b9e05ce2bd7f5d47b1e472074ab973fd2b7cffb90623 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 423 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000418371406053&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1063-6919 |
| IngestDate | Wed Aug 27 06:13:56 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-98d6617341893ecb5f290b9e05ce2bd7f5d47b1e472074ab973fd2b7cffb90623 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_8100161 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-July |
| PublicationDateYYYYMMDD | 2017-07-01 |
| PublicationDate_xml | – month: 07 year: 2017 text: 2017-July |
| PublicationDecade | 2010 |
| PublicationTitle | 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0023720 ssj0003211698 |
| Score | 2.5893545 |
| Snippet | Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 6402 |
| SubjectTerms | Adaptation models Feature extraction Motion pictures Proposals Training Videos Visualization |
| Title | UntrimmedNets for Weakly Supervised Action Recognition and Detection |
| URI | https://ieeexplore.ieee.org/document/8100161 |
| WOSCitedRecordID | wos000418371406053&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG-QePCECsbv9ODRwbZua3s0IPFgCEFBbmTt3hKiDsKGif-9r90YHrx4a162Znnt-nt9Xz9C7rhQiCIpc5gWeEEBFji4lUPHjT0htKtl6YecPfPRSMznctwg93UtDADY5DPomqGN5ScrvTWusp7wrIVyQA4452WtVu1PYXiTiWQdQfAN-4qNdEbMiaQn9_01e_3ZeGKSung3Muxqv1hVLKgMW__7nGPS2Vfn0XGNOyekAdkpaVXmJK1-1hxFO8aGnaxNBtOs2Cw_EQFHUOQUDVb6BvH7xzd92a7NqZHjDA-21IFOdqlFOI6zhA6gsGlbWYdMh4-v_Sen4lFwlmgcFI4UCaIwR7xC4wS0ClNfukqCG2rwVcLTMAm48iBAlfEgVpKzNPEV12mqTBtjdkaa2SqDc0LNy6mHokjHgQ-uCqVpVY4PhknEPH1B2kZPi3XZKmNRqejyb_EVOTLLUGa_XpNmsdnCDTnUX8Uy39za9f0Bx0GjWQ |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8QTfSECsZvd_DoYFu3dT0akGjEhSAgN7J2bwlRB2HDxP_e126ABy_empetWV67_l7f14-QWxYIRJGEmlQGeEEB6pq4lT3TiuwgkJbkhR9y3GNhGEwmvF8hd5taGADQyWfQVEMdy4_ncqVcZa3A1hbKDtn1XNexi2qtjUeF4l3G55sYgqP4V3Ss06emz22-7bDZao_7A5XWxZq-4lf7xauiYaVb-98HHZLGtj7P6G-Q54hUID0mtdKgNMrfNUPRmrNhLauTzijNl7NPxMAQ8sxAk9V4g-j949t4XS3UuZHhDPe62MEYrJOLcBylsdGBXCdupQ0y6j4M249myaRgztA8yE0exIjDDBELzROQwkscbgkOlifBETFLvNhlwgYXVcbcSHBGk9gRTCaJUI2M6QmppvMUTomhXk5sFPkych2whMdVs3J80It9asszUld6mi6KZhnTUkXnf4tvyP7j8KU37T2FzxfkQC1JkQt7Sar5cgVXZE9-5bNsea3X-gd2aqag |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2017+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=UntrimmedNets+for+Weakly+Supervised+Action+Recognition+and+Detection&rft.au=Wang%2C+Limin&rft.au=Xiong%2C+Yuanjun&rft.au=Lin%2C+Dahua&rft.au=Van+Gool%2C+Luc&rft.date=2017-07-01&rft.pub=IEEE&rft.issn=1063-6919&rft.eissn=1063-6919&rft.spage=6402&rft.epage=6411&rft_id=info:doi/10.1109%2FCVPR.2017.678&rft.externalDocID=8100161 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon |