ClickVOS: Click Video Object Segmentation
Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects at the first frame during inference or lack the flexibility to specify arbitrary objects of interest. To address these limitations, we propo...
Uložené v:
| Vydané v: | IEEE transactions on circuits and systems for video technology s. 1 |
|---|---|
| Hlavní autori: | , , , , , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
2025
|
| Predmet: | |
| ISSN: | 1051-8215, 1558-2205 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects at the first frame during inference or lack the flexibility to specify arbitrary objects of interest. To address these limitations, we propose the setting named Click Video Object Segmentation (ClickVOS) which segments objects of interest across the whole video according to a single click per object in the first frame. And we provide the extended datasets DAVIS-P and YouTubeVOS-P that with point annotations to support this task. ClickVOS is of significant practical applications and research implications due to its only 1-2 seconds interaction time for indicating an object, comparing annotating the mask of an object needs several minutes. However, ClickVOS also presents increased challenges. To address this task, we propose an end-to-end baseline approach named called Attention Before Segmentation (ABS), motivated by the attention process of humans. ABS utilizes the given point in the first frame to perceive the target object through a concise yet effective segmentation attention. Although the initial object mask is possibly inaccurate, in our ABS, as the video goes on, the initially imprecise object mask can self-heal instead of deteriorating due to error accumulation, which is attributed to our designed improvement memory that continuously records stable global object memory and updates detailed dense memory. In addition, we conduct various baseline explorations utilizing off-the-shelf algorithms from related fields, which could provide insights for the further exploration of ClickVOS. The experimental results demonstrate the superiority of the proposed ABS approach. Extended datasets and codes will be available at https://github.com/PinxueGuo/ClickVOS. |
|---|---|
| AbstractList | Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects at the first frame during inference or lack the flexibility to specify arbitrary objects of interest. To address these limitations, we propose the setting named Click Video Object Segmentation (ClickVOS) which segments objects of interest across the whole video according to a single click per object in the first frame. And we provide the extended datasets DAVIS-P and YouTubeVOS-P that with point annotations to support this task. ClickVOS is of significant practical applications and research implications due to its only 1-2 seconds interaction time for indicating an object, comparing annotating the mask of an object needs several minutes. However, ClickVOS also presents increased challenges. To address this task, we propose an end-to-end baseline approach named called Attention Before Segmentation (ABS), motivated by the attention process of humans. ABS utilizes the given point in the first frame to perceive the target object through a concise yet effective segmentation attention. Although the initial object mask is possibly inaccurate, in our ABS, as the video goes on, the initially imprecise object mask can self-heal instead of deteriorating due to error accumulation, which is attributed to our designed improvement memory that continuously records stable global object memory and updates detailed dense memory. In addition, we conduct various baseline explorations utilizing off-the-shelf algorithms from related fields, which could provide insights for the further exploration of ClickVOS. The experimental results demonstrate the superiority of the proposed ABS approach. Extended datasets and codes will be available at https://github.com/PinxueGuo/ClickVOS. |
| Author | Zhang, Wenqiang Li, Xiaoqiang Chen, Zhaoyu Gao, Shuyong Li, Wanyun Zhang, Wei Hong, Lingyi Zhou, Xinyu Li, Jinglun Guo, Pinxue |
| Author_xml | – sequence: 1 givenname: Pinxue orcidid: 0000-0002-4388-9757 surname: Guo fullname: Guo, Pinxue email: pxguo21@m.fudan.edu.cn organization: Academy for Engineering&Technology, Shanghai Engineering Research Center of AI&Robotics, Fudan University, Shanghai, China – sequence: 2 givenname: Lingyi orcidid: 0000-0002-2749-5133 surname: Hong fullname: Hong, Lingyi email: lyhong22@m.fudan.edu.cn organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China – sequence: 3 givenname: Xinyu surname: Zhou fullname: Zhou, Xinyu email: zhouxinyu20@fudan.edu.cn organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China – sequence: 4 givenname: Shuyong orcidid: 0000-0002-8992-0756 surname: Gao fullname: Gao, Shuyong email: sygao18@fudan.edu.cn organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China – sequence: 5 givenname: Wanyun orcidid: 0009-0000-0669-0661 surname: Li fullname: Li, Wanyun email: wyli22@m.fudan.edu.cn organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China – sequence: 6 givenname: Jinglun surname: Li fullname: Li, Jinglun email: jingli960423@gmail.com organization: Academy for Engineering&Technology, Shanghai Engineering Research Center of AI&Robotics, Fudan University, Shanghai, China – sequence: 7 givenname: Zhaoyu surname: Chen fullname: Chen, Zhaoyu email: zhaoyuchen20@fudan.edu.cn organization: Academy for Engineering&Technology, Shanghai Engineering Research Center of AI&Robotics, Fudan University, Shanghai, China – sequence: 8 givenname: Xiaoqiang orcidid: 0000-0001-7243-2783 surname: Li fullname: Li, Xiaoqiang email: xqli@shu.edu.cn organization: School of Computer Engineering and Science, Shanghai University, Shanghai, China – sequence: 9 givenname: Wei orcidid: 0000-0002-2358-8543 surname: Zhang fullname: Zhang, Wei email: weizh@fudan.edu.cn organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China – sequence: 10 givenname: Wenqiang orcidid: 0000-0002-3339-8751 surname: Zhang fullname: Zhang, Wenqiang email: wqzhang@fudan.edu.cn organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China |
| BookMark | eNpFj0tLw0AURgepYFv9A-IiWxep984rM-4kWBUKWSRmO8wkN5LaJpJk47-3L3D1nc354CzYrOs7YuweYYUI9qlI87JYceBqJZS1AOqKzVEpE3MOanZgUBgbjuqGLcZxC4DSyGTOHtNdW32XWf4cnSgq25r6KAtbqqYop689dZOf2r67ZdeN3410d9kl-1y_Ful7vMnePtKXTVyh0FMspWmMQG7Qi6DrWvsEJAZFoU6Et5Jj8CSsUYYoaIDGEpABBK0b46EWS8bPv9XQj-NAjfsZ2r0ffh2CO8a6U6w7xrpL7EF6OEstEf0LiFwak4g_i8BQ6g |
| CODEN | ITCTEM |
| ContentType | Journal Article |
| DBID | 97E RIA RIE AAYXX CITATION |
| DOI | 10.1109/TCSVT.2025.3599005 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1558-2205 |
| EndPage | 1 |
| ExternalDocumentID | 10_1109_TCSVT_2025_3599005 11124887 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Scientific and Technological innovation action plan of Shanghai Science and Technology Committee grantid: 21DZ2203300; 22511101502; 22511102202 – fundername: National Natural Science Foundation of China grantid: 62072112 funderid: 10.13039/501100001809 |
| GroupedDBID | -~X 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 5VS AAYXX AETIX AGSQL AI. AIBXA ALLEH CITATION EJD H~9 ICLAB IFJZH VH1 |
| ID | FETCH-LOGICAL-c136t-448f831281a3b6dd6a7041b5ebd73a9421bae39858eeb600f9e0e801066f8a0d3 |
| IEDL.DBID | RIE |
| ISSN | 1051-8215 |
| IngestDate | Sat Nov 29 07:39:26 EST 2025 Wed Aug 27 02:00:06 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c136t-448f831281a3b6dd6a7041b5ebd73a9421bae39858eeb600f9e0e801066f8a0d3 |
| ORCID | 0000-0002-8992-0756 0000-0002-2749-5133 0000-0001-7243-2783 0000-0002-4388-9757 0000-0002-3339-8751 0009-0000-0669-0661 0000-0002-2358-8543 |
| PageCount | 1 |
| ParticipantIDs | crossref_primary_10_1109_TCSVT_2025_3599005 ieee_primary_11124887 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-00-00 |
| PublicationDateYYYYMMDD | 2025-01-01 |
| PublicationDate_xml | – year: 2025 text: 2025-00-00 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE transactions on circuits and systems for video technology |
| PublicationTitleAbbrev | TCSVT |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0014847 |
| Score | 2.4581053 |
| Snippet | Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects... |
| SourceID | crossref ieee |
| SourceType | Index Database Publisher |
| StartPage | 1 |
| SubjectTerms | Accuracy Adaptation models Annotations Click video object segmentation Data mining Feature extraction human effort efficiency Manuals Motion segmentation Object segmentation single click Training video object segmentation Videos |
| Title | ClickVOS: Click Video Object Segmentation |
| URI | https://ieeexplore.ieee.org/document/11124887 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2205 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014847 issn: 1051-8215 databaseCode: RIE dateStart: 19910101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYoCBzyLKlzKwIOQ2yTmxw4YqKgbUIrVU3SI7PqMKSFFp-f3YTgpdGNisKJGi5yR37y7vHiFXiQzBxIpR0EZRljFNJQpJjWZGyQh0gn5k_iPv98Vkkj3VYnWvhUFE__MZtt3S9_L1rFi6UlnHvpexfeD4JtnkPK3EWj8tAya8m5jNFyIqbCBbKWTCrDPqDscjywXjpA2J_fw6r7q1KLRmq-KjSm_vn_ezT3br9DG4q_b7gGxgeUh21oYKHpHr7tu0eB0PhreBXwXjqcZZMFCu4hIM8eW9lhuVTfLcux91H2htiECLCNIFtVTKCHC9Lwkq1TqVPGSRSlBpDjJjcaQkQiYSgahsJmMyDFE41pcaIUMNx6RRzko8IQEkMkINEo0lRApQSc5BGe1cBhWwuEVuVgDlH9Xci9zzhTDLPZy5gzOv4WyRpkPn98wamNM_jp-RbXd5Vco4J43FfIkXZKv4Wkw_55d-X78B4AOfdw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8JAEJ0omqgHPzHiZw9ejCm03W279WaIBCOCCZVwa3a7s4aoxSD4-91dinLx4G3TNE3ztu3Mm-mbB3AZco-oQFCXSCVcmlDpcmTcVZIqwX0iQ7Qj8ztxt8uGw-SpFKtbLQwi2p_PsG6Wtpcvx_nMlMoa-r0M9AMXr8Kasc4q5Vo_TQPKrJ-Yzhh8l-lQttDIeEkjbfYHqWaDQVgnof4AG7e6pTi0ZKxi40pr5593tAvbZQLp3M53fA9WsNiHraWxggdw1Xwb5a-DXv_GsStnMJI4dnrC1FycPr68l4KjogrPrbu02XZLSwQ390k0dTWZUoyY7hcnIpIy4rFHfRGikDHhCQ18wZEkLGSIQucyKkEPmeF9kWLck-QQKsW4wCNwSMh9lISj0pRIEBQ8jolQ0vgMCkKDGlwvAMo-5pMvMssYvCSzcGYGzqyEswZVg87vmSUwx38cv4CNdvrYyTr33YcT2DSXmhc2TqEynczwDNbzr-noc3Ju9_gbZe-iwA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ClickVOS%3A+Click+Video+Object+Segmentation&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Guo%2C+Pinxue&rft.au=Hong%2C+Lingyi&rft.au=Zhou%2C+Xinyu&rft.au=Gao%2C+Shuyong&rft.date=2025&rft.pub=IEEE&rft.issn=1051-8215&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTCSVT.2025.3599005&rft.externalDocID=11124887 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon |