CIDEr: Consensus-based image description evaluation

Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality o...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) s. 4566 - 4575
Hlavní autoři: Vedantam, Ramakrishna, Zitnick, C. Lawrence, Parikh, Devi
Médium: Konferenční příspěvek Journal Article
Jazyk:angličtina
Vydáno: IEEE 01.06.2015
Témata:
ISSN:1063-6919, 1063-6919
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collecting human annotations to measure consensus, a new automated metric that captures consensus, and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences describing each image. Our simple metric captures human judgment of consensus better than existing metrics across sentences generated by various sources. We also evaluate five state-of-the-art image description approaches using this new protocol and provide a benchmark for future comparisons. A version of CIDEr named CIDEr-D is available as a part of MS COCO evaluation server to enable systematic evaluation and benchmarking.
AbstractList Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collecting human annotations to measure consensus, a new automated metric that captures consensus, and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences describing each image. Our simple metric captures human judgment of consensus better than existing metrics across sentences generated by various sources. We also evaluate five state-of-the-art image description approaches using this new protocol and provide a benchmark for future comparisons. A version of CIDEr named CIDEr-D is available as a part of MS COCO evaluation server to enable systematic evaluation and benchmarking.
Author Vedantam, Ramakrishna
Zitnick, C. Lawrence
Parikh, Devi
Author_xml – sequence: 1
  givenname: Ramakrishna
  surname: Vedantam
  fullname: Vedantam, Ramakrishna
  email: vrama91@vt.edu
  organization: Virginia Tech, Blacksburg, VA, USA
– sequence: 2
  givenname: C. Lawrence
  surname: Zitnick
  fullname: Zitnick, C. Lawrence
  email: larryz@microsoft.com
  organization: Microsoft Res., Redmond, WA, USA
– sequence: 3
  givenname: Devi
  surname: Parikh
  fullname: Parikh, Devi
  email: parikh@vt.edu
  organization: Virgnia Tech, Blacksburg, VA, USA
BookMark eNpNkMFKw0AURUepYFv7AeImSzeJ72UmmRl3klYtFBRRt2GSvJGBNImZRPDvTWkXru5ZXA6Xu2Czpm2IsWuECBH0Xfb5-hbFgEkkY61ByTO2QJFKnupUwDmbI6Q8TDXq2T--ZCvvXQEcQGkdw5zxbLve9PdB1jaeGj_6sDCeqsDtzRcFFfmyd93g2iagH1OP5oBX7MKa2tPqlEv28bh5z57D3cvTNnvYhS4GNYSGKLUchEGpgBfKaFPEUimBaE0lKwuckwBlY1vaBDFJqmknGiRAPOCS3R69Xd9-j-SHfO98SXVtGmpHn6OUwMVk1FP15lh1RJR3_TS__81P1_A_1XlV2A
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/CVPR.2015.7299087
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 1467369640
9781467369640
EISSN 1063-6919
EndPage 4575
ExternalDocumentID 7299087
Genre orig-research
GroupedDBID 23M
29F
29O
6IE
6IH
6IK
ABDPE
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IPLJI
M43
RIE
RIO
RNS
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-i208t-aee6f304a17803b8a9ab2788411fad7df033e408f2fcf51155d0631a1e011d063
IEDL.DBID RIE
ISICitedReferencesCount 2825
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000387959204066&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1063-6919
IngestDate Fri Sep 05 10:08:47 EDT 2025
Wed Aug 27 02:49:18 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i208t-aee6f304a17803b8a9ab2788411fad7df033e408f2fcf51155d0631a1e011d063
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Conference-1
ObjectType-Feature-3
content type line 23
SourceType-Conference Papers & Proceedings-2
PQID 1770347889
PQPubID 23500
PageCount 10
ParticipantIDs ieee_primary_7299087
proquest_miscellaneous_1770347889
PublicationCentury 2000
PublicationDate 20150601
PublicationDateYYYYMMDD 2015-06-01
PublicationDate_xml – month: 06
  year: 2015
  text: 20150601
  day: 01
PublicationDecade 2010
PublicationTitle 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev CVPR
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib030089920
ssj0023720
ssj0003211698
Score 2.5747483
Snippet Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in...
SourceID proquest
ieee
SourceType Aggregation Database
Publisher
StartPage 4566
SubjectTerms Accuracy
Benchmarking
Cider
Computer vision
Conferences
Correlation
Human
Measurement
Pattern recognition
Protocols
Sentences
Silicon
Testing
Training
Title CIDEr: Consensus-based image description evaluation
URI https://ieeexplore.ieee.org/document/7299087
https://www.proquest.com/docview/1770347889
WOSCitedRecordID wos000387959204066&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELXaioGpQIsoXwoSI279kdgxa2kFEqoqBFW3yIkvUgdS1LT8fuzUSQdY2CxLUaLL5d057-4eQveEAVU5izDLU4lDwTVWDBjOgIUio1SQSqZz8Spns3i5VPMWemh6YQCgKj6DoVtWXL5ZZzv3q2wkHXbGso3aUsp9r1btO5w4_sqnPg6FuT3ZCNUwCsypsVTMp-BYKKo8w0mJGo0X8zdX5BUN_Q280soveK5izrT7v6c9Qf1D814wb8LSKWpBcYa6PtsM_Ldc2q1a0KHe6yE-fnmabB4Dp-LpJDBK7IKcCVafFnUCAw3EBIch4X30MZ28j5-xV1XAK0biLdYAIuck1FTGhKexVjpl9iAcUpprI01OOIeQxDnLs9ymY1FkrM2opmChwC3PUadYF3CBAntWisOIm9A1pKYZaCCGpyC04SazqeUA9ZxZkq_94IzEW2SA7mq7JtaZHUOhC1jvyoRKC0BuoL-6_PvSK3TsXtS-VusadbabHdygo-x7uyo3t5VH_ACtjbC-
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4gmugJFYz4rIlHF_bVx3pFCEQkxCDh1my704SDYHj4-90tBQ568bbZpGkznX4z229mPoBHypGpjPuEZ0lIZCA0URw5SZHLIGUsoLlM57gfDgbRZKKGJXja9cIgYl58hg23zLl8M0_X7ldZM3TYGYUHcOhLydmmW2vrPYI6BqtIfhwOC3u2CdSOU-BOjyXnPgNBAsVUwXEyqpqt8fDdlXn5jeIWhdbKL4DOo06n8r_nPYXavn3PG-4C0xmUcHYOlSLf9IqveWm3tpIO270qiFbvpb149pyOpxPBWBIX5ow3_bS44xncgYy3HxNeg49Oe9TqkkJXgUw5jVZEIwaZoFKzMKIiibTSCbdHYclYpk1oMioEShplPEszm5D5vrE2Y5qhBQO3vIDybD7DS_DsaSmSvjDStaQmKWqkRiQYaCNMapPLOlSdWeKvzeiMuLBIHR62do2tOzuOQs9wvl7GLLQQ5Eb6q6u_L72H4-7orR_3e4PXazhxL21TuXUD5dVijbdwlH6vpsvFXe4dPxCZtAU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2015+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=CIDEr%3A+Consensus-based+image+description+evaluation&rft.au=Vedantam%2C+Ramakrishna&rft.au=Zitnick%2C+C.+Lawrence&rft.au=Parikh%2C+Devi&rft.date=2015-06-01&rft.pub=IEEE&rft.issn=1063-6919&rft.eissn=1063-6919&rft.spage=4566&rft.epage=4575&rft_id=info:doi/10.1109%2FCVPR.2015.7299087&rft.externalDocID=7299087
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon