ClickVOS: Click Video Object Segmentation

Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects at the first frame during inference or lack the flexibility to specify arbitrary objects of interest. To address these limitations, we propo...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on circuits and systems for video technology s. 1
Hlavní autori: Guo, Pinxue, Hong, Lingyi, Zhou, Xinyu, Gao, Shuyong, Li, Wanyun, Li, Jinglun, Chen, Zhaoyu, Li, Xiaoqiang, Zhang, Wei, Zhang, Wenqiang
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: IEEE 2025
Predmet:
ISSN:1051-8215, 1558-2205
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects at the first frame during inference or lack the flexibility to specify arbitrary objects of interest. To address these limitations, we propose the setting named Click Video Object Segmentation (ClickVOS) which segments objects of interest across the whole video according to a single click per object in the first frame. And we provide the extended datasets DAVIS-P and YouTubeVOS-P that with point annotations to support this task. ClickVOS is of significant practical applications and research implications due to its only 1-2 seconds interaction time for indicating an object, comparing annotating the mask of an object needs several minutes. However, ClickVOS also presents increased challenges. To address this task, we propose an end-to-end baseline approach named called Attention Before Segmentation (ABS), motivated by the attention process of humans. ABS utilizes the given point in the first frame to perceive the target object through a concise yet effective segmentation attention. Although the initial object mask is possibly inaccurate, in our ABS, as the video goes on, the initially imprecise object mask can self-heal instead of deteriorating due to error accumulation, which is attributed to our designed improvement memory that continuously records stable global object memory and updates detailed dense memory. In addition, we conduct various baseline explorations utilizing off-the-shelf algorithms from related fields, which could provide insights for the further exploration of ClickVOS. The experimental results demonstrate the superiority of the proposed ABS approach. Extended datasets and codes will be available at https://github.com/PinxueGuo/ClickVOS.
AbstractList Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects at the first frame during inference or lack the flexibility to specify arbitrary objects of interest. To address these limitations, we propose the setting named Click Video Object Segmentation (ClickVOS) which segments objects of interest across the whole video according to a single click per object in the first frame. And we provide the extended datasets DAVIS-P and YouTubeVOS-P that with point annotations to support this task. ClickVOS is of significant practical applications and research implications due to its only 1-2 seconds interaction time for indicating an object, comparing annotating the mask of an object needs several minutes. However, ClickVOS also presents increased challenges. To address this task, we propose an end-to-end baseline approach named called Attention Before Segmentation (ABS), motivated by the attention process of humans. ABS utilizes the given point in the first frame to perceive the target object through a concise yet effective segmentation attention. Although the initial object mask is possibly inaccurate, in our ABS, as the video goes on, the initially imprecise object mask can self-heal instead of deteriorating due to error accumulation, which is attributed to our designed improvement memory that continuously records stable global object memory and updates detailed dense memory. In addition, we conduct various baseline explorations utilizing off-the-shelf algorithms from related fields, which could provide insights for the further exploration of ClickVOS. The experimental results demonstrate the superiority of the proposed ABS approach. Extended datasets and codes will be available at https://github.com/PinxueGuo/ClickVOS.
Author Zhang, Wenqiang
Li, Xiaoqiang
Chen, Zhaoyu
Gao, Shuyong
Li, Wanyun
Zhang, Wei
Hong, Lingyi
Zhou, Xinyu
Li, Jinglun
Guo, Pinxue
Author_xml – sequence: 1
  givenname: Pinxue
  orcidid: 0000-0002-4388-9757
  surname: Guo
  fullname: Guo, Pinxue
  email: pxguo21@m.fudan.edu.cn
  organization: Academy for Engineering&Technology, Shanghai Engineering Research Center of AI&Robotics, Fudan University, Shanghai, China
– sequence: 2
  givenname: Lingyi
  orcidid: 0000-0002-2749-5133
  surname: Hong
  fullname: Hong, Lingyi
  email: lyhong22@m.fudan.edu.cn
  organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
– sequence: 3
  givenname: Xinyu
  surname: Zhou
  fullname: Zhou, Xinyu
  email: zhouxinyu20@fudan.edu.cn
  organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
– sequence: 4
  givenname: Shuyong
  orcidid: 0000-0002-8992-0756
  surname: Gao
  fullname: Gao, Shuyong
  email: sygao18@fudan.edu.cn
  organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
– sequence: 5
  givenname: Wanyun
  orcidid: 0009-0000-0669-0661
  surname: Li
  fullname: Li, Wanyun
  email: wyli22@m.fudan.edu.cn
  organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
– sequence: 6
  givenname: Jinglun
  surname: Li
  fullname: Li, Jinglun
  email: jingli960423@gmail.com
  organization: Academy for Engineering&Technology, Shanghai Engineering Research Center of AI&Robotics, Fudan University, Shanghai, China
– sequence: 7
  givenname: Zhaoyu
  surname: Chen
  fullname: Chen, Zhaoyu
  email: zhaoyuchen20@fudan.edu.cn
  organization: Academy for Engineering&Technology, Shanghai Engineering Research Center of AI&Robotics, Fudan University, Shanghai, China
– sequence: 8
  givenname: Xiaoqiang
  orcidid: 0000-0001-7243-2783
  surname: Li
  fullname: Li, Xiaoqiang
  email: xqli@shu.edu.cn
  organization: School of Computer Engineering and Science, Shanghai University, Shanghai, China
– sequence: 9
  givenname: Wei
  orcidid: 0000-0002-2358-8543
  surname: Zhang
  fullname: Zhang, Wei
  email: weizh@fudan.edu.cn
  organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
– sequence: 10
  givenname: Wenqiang
  orcidid: 0000-0002-3339-8751
  surname: Zhang
  fullname: Zhang, Wenqiang
  email: wqzhang@fudan.edu.cn
  organization: School of Computer Science, Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
BookMark eNpFj0tLw0AURgepYFv9A-IiWxep984rM-4kWBUKWSRmO8wkN5LaJpJk47-3L3D1nc354CzYrOs7YuweYYUI9qlI87JYceBqJZS1AOqKzVEpE3MOanZgUBgbjuqGLcZxC4DSyGTOHtNdW32XWf4cnSgq25r6KAtbqqYop689dZOf2r67ZdeN3410d9kl-1y_Ful7vMnePtKXTVyh0FMspWmMQG7Qi6DrWvsEJAZFoU6Et5Jj8CSsUYYoaIDGEpABBK0b46EWS8bPv9XQj-NAjfsZ2r0ffh2CO8a6U6w7xrpL7EF6OEstEf0LiFwak4g_i8BQ6g
CODEN ITCTEM
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TCSVT.2025.3599005
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 1
ExternalDocumentID 10_1109_TCSVT_2025_3599005
11124887
Genre orig-research
GrantInformation_xml – fundername: Scientific and Technological innovation action plan of Shanghai Science and Technology Committee
  grantid: 21DZ2203300; 22511101502; 22511102202
– fundername: National Natural Science Foundation of China
  grantid: 62072112
  funderid: 10.13039/501100001809
GroupedDBID -~X
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
5VS
AAYXX
AETIX
AGSQL
AI.
AIBXA
ALLEH
CITATION
EJD
H~9
ICLAB
IFJZH
VH1
ID FETCH-LOGICAL-c136t-448f831281a3b6dd6a7041b5ebd73a9421bae39858eeb600f9e0e801066f8a0d3
IEDL.DBID RIE
ISSN 1051-8215
IngestDate Sat Nov 29 07:39:26 EST 2025
Wed Aug 27 02:00:06 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c136t-448f831281a3b6dd6a7041b5ebd73a9421bae39858eeb600f9e0e801066f8a0d3
ORCID 0000-0002-8992-0756
0000-0002-2749-5133
0000-0001-7243-2783
0000-0002-4388-9757
0000-0002-3339-8751
0009-0000-0669-0661
0000-0002-2358-8543
PageCount 1
ParticipantIDs crossref_primary_10_1109_TCSVT_2025_3599005
ieee_primary_11124887
PublicationCentury 2000
PublicationDate 2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025-00-00
PublicationDecade 2020
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0014847
Score 2.4581053
Snippet Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 1
SubjectTerms Accuracy
Adaptation models
Annotations
Click video object segmentation
Data mining
Feature extraction
human effort efficiency
Manuals
Motion segmentation
Object segmentation
single click
Training
video object segmentation
Videos
Title ClickVOS: Click Video Object Segmentation
URI https://ieeexplore.ieee.org/document/11124887
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2205
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014847
  issn: 1051-8215
  databaseCode: RIE
  dateStart: 19910101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgYoCBzyLKlzKwIOQ2yTmxw4YqKgbUIrVU3SI7PqMKSFFp-f3YTgpdGNisKJGi5yR37y7vHiFXiQzBxIpR0EZRljFNJQpJjWZGyQh0gn5k_iPv98Vkkj3VYnWvhUFE__MZtt3S9_L1rFi6UlnHvpexfeD4JtnkPK3EWj8tAya8m5jNFyIqbCBbKWTCrDPqDscjywXjpA2J_fw6r7q1KLRmq-KjSm_vn_ezT3br9DG4q_b7gGxgeUh21oYKHpHr7tu0eB0PhreBXwXjqcZZMFCu4hIM8eW9lhuVTfLcux91H2htiECLCNIFtVTKCHC9Lwkq1TqVPGSRSlBpDjJjcaQkQiYSgahsJmMyDFE41pcaIUMNx6RRzko8IQEkMkINEo0lRApQSc5BGe1cBhWwuEVuVgDlH9Xci9zzhTDLPZy5gzOv4WyRpkPn98wamNM_jp-RbXd5Vco4J43FfIkXZKv4Wkw_55d-X78B4AOfdw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8JAEJ0omqgHPzHiZw9ejCm03W279WaIBCOCCZVwa3a7s4aoxSD4-91dinLx4G3TNE3ztu3Mm-mbB3AZco-oQFCXSCVcmlDpcmTcVZIqwX0iQ7Qj8ztxt8uGw-SpFKtbLQwi2p_PsG6Wtpcvx_nMlMoa-r0M9AMXr8Kasc4q5Vo_TQPKrJ-Yzhh8l-lQttDIeEkjbfYHqWaDQVgnof4AG7e6pTi0ZKxi40pr5593tAvbZQLp3M53fA9WsNiHraWxggdw1Xwb5a-DXv_GsStnMJI4dnrC1FycPr68l4KjogrPrbu02XZLSwQ390k0dTWZUoyY7hcnIpIy4rFHfRGikDHhCQ18wZEkLGSIQucyKkEPmeF9kWLck-QQKsW4wCNwSMh9lISj0pRIEBQ8jolQ0vgMCkKDGlwvAMo-5pMvMssYvCSzcGYGzqyEswZVg87vmSUwx38cv4CNdvrYyTr33YcT2DSXmhc2TqEynczwDNbzr-noc3Ju9_gbZe-iwA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ClickVOS%3A+Click+Video+Object+Segmentation&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Guo%2C+Pinxue&rft.au=Hong%2C+Lingyi&rft.au=Zhou%2C+Xinyu&rft.au=Gao%2C+Shuyong&rft.date=2025&rft.pub=IEEE&rft.issn=1051-8215&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTCSVT.2025.3599005&rft.externalDocID=11124887
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon