Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks

Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The success of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level f...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2014 IEEE Conference on Computer Vision and Pattern Recognition s. 1717 - 1724
Hlavní autoři:	Oquab, Maxime, Bottou, Leon, Laptev, Ivan, Sivic, Josef
Médium:	Konferenční příspěvek Journal Article
Jazyk:	angličtina
Vydáno:	IEEE 01.06.2014
Témata:	Computer vision Image recognition Image representation Neural networks Object recognition Pascal (programming language) Pattern recognition Representations Training Training data Visual Visualization Volatile organic compounds
ISSN:	1063-6919, 1063-6919
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The success of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification methods. Learning CNNs, however, amounts to estimating millions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be efficiently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization.
AbstractList	Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The success of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification methods. Learning CNNs, however, amounts to estimating millions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be efficiently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization. Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The suc- cess of CNNs is attributed to their ability to learn rich mid- level image representations as opposed to hand-designed low-level features used in other image classification meth- ods. Learning CNNs, however, amounts to estimating mil- lions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be effi- ciently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred rep- resentation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization.
Author	Bottou, Leon Oquab, Maxime Laptev, Ivan Sivic, Josef
Author_xml	– sequence: 1 givenname: Maxime surname: Oquab fullname: Oquab, Maxime organization: INRIA, Paris, France – sequence: 2 givenname: Leon surname: Bottou fullname: Bottou, Leon organization: MSR, New York, NY, USA – sequence: 3 givenname: Ivan surname: Laptev fullname: Laptev, Ivan organization: INRIA, Paris, France – sequence: 4 givenname: Josef surname: Sivic fullname: Sivic, Josef organization: INRIA, Paris, France
BookMark	eNpNjDtPwzAURg0qEqV0ZGLJyJLia8d2PKKIR6XyUFXYUHWb3FQWqVPspIh_DwUGpvPp6Og7YQPfemLsDPgEgNvL4uVpPhEcsokQ4oCNrckhM9YqgFwdsiFwLVNtwQ7-7WM2jtGtuNBGZ0rqIXudEQbv_DpBXyWLgD7WFMJe3LsqbWhHTTLd4JqSOW0DRfIddq71MXmO-6po_a5t-r3CJnmgPvyg-2jDWzxlRzU2kcZ_HLHFzfWiuEtnj7fT4mqWOmGyLrUlKrQqy0tbQUUiK3kOSlZIJjcCpNLK1CRrTjVWK1ErIWutkAiqEqmUI3bxe7sN7XtPsVtuXCypadBT28claGMsV5rn3-n5b-qIaLkNboPhc6kttxpy-QU2UWcD
CODEN	IEEPAD
ContentType	Conference Proceeding Journal Article
DBID	6IE 6IH CBEJK RIE RIO 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/CVPR.2014.222
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Computer Science
EISBN	9781479951185 1479951188
EISSN	1063-6919
EndPage	1724
ExternalDocumentID	6909618
Genre	orig-research
GroupedDBID	23M 29F 29O 6IE 6IH 6IK ABDPE ACGFS ALMA_UNASSIGNED_HOLDINGS CBEJK IPLJI M43 RIE RIO RNS 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-i274t-9ca5a9548c9d1de24c08153dae7872135657fe3f0efadb2f523f65aee1dcaec3
IEDL.DBID	RIE
ISICitedReferencesCount	2088
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000361555601097&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	1063-6919
IngestDate	Thu Oct 02 07:08:15 EDT 2025 Wed Aug 27 04:30:17 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i274t-9ca5a9548c9d1de24c08153dae7872135657fe3f0efadb2f523f65aee1dcaec3
Notes	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2
PQID	1677905608
PQPubID	23500
PageCount	8
ParticipantIDs	ieee_primary_6909618 proquest_miscellaneous_1677905608
PublicationCentury	2000
PublicationDate	20140601
PublicationDateYYYYMMDD	2014-06-01
PublicationDate_xml	– month: 06 year: 2014 text: 20140601 day: 01
PublicationDecade	2010
PublicationTitle	2014 IEEE Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev	CVPR
PublicationYear	2014
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib026764536 ssj0023720 ssj0003211698
Score	2.5456948
Snippet	Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge...
SourceID	proquest ieee
SourceType	Aggregation Database Publisher
StartPage	1717
SubjectTerms	Computer vision Image recognition Image representation Neural networks Object recognition Pascal (programming language) Pattern recognition Representations Training Training data Visual Visualization Volatile organic compounds
Title	Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
URI	https://ieeexplore.ieee.org/document/6909618 https://www.proquest.com/docview/1677905608
WOSCitedRecordID	wos000361555601097&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Pa8IwFA4qO-zkNh1zv8hgx1Vtk6btWSbbYSIiw8soafIyBFeHVf_-5aWtHrbLTi2BQkhe3svr-973EfIYiUxynTBs-8g8HieRJzOmPWVELIW23kFrJzYRTSbxYpFMG-Tp0AsDAA58Bn18dbV8vVY7_FU2sJkcCpQ0STOKorJXq7adQESCh6V2t_PCzGY2IjlUFAJUY3GVT8E8kfjJkW9zMHqfzhDkxfsBaug6lZVfrtnFm3H7fzM9I91j4x6dHkLSOWlAfkHa1U2TVue4sEO1mEM91iEfFdPqJ5W5pi6EGWRttANvS-2tEFxEX7-s-6Ezh56tmpbygjrYAbVT2Fd2LFcUST_cw6HMiy6Zj5_noxev0l7wljZP3XqJkqFEMjiVaF9DwJW9O4RMS7AnPPAZVksNMDMEI3UWGJvPGhFKAF8rCYpdkla-zuGK0DBWwobADLgxPBpmGZdM6xip5IbaAO-RDq5f-l2ya6TV0vXIQ70BqbV4LGPIHNa7IvUFciTam1p8_fenN-QUd7MEdN2S1nazgztyovbbZbG5d2bzA4LswmM
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB58gZ58rbg-I3i067ZJ0_YsiqIuiyziRUqaTGRh7Yrd9febSdv1oBdPLYFCSCYzmc433wdwnshCCZNxavsoApFmSaAKbgJtZaqkcd7BGC82kQwG6ctLNlyCi0UvDCJ68Bn26NXX8s1Uz-lX2aXL5EigZBlWYyGisO7Waq0nkokUca3e7f0wd7mNzBY1hYj0WHztU_JAZmH2w7h5efU8fCKYl-hFpKLrdVZ-OWcfcW42_zfXLej8tO6x4SIobcMSljuw2dw1WXOSKzfUyjm0Y7vw2nCtvjFVGuaDmCXeRjfwODbBhOBF7O7dOSD25PGzTdtSWTEPPGBuCl-NJasJI9oP__A486oDo5vr0dVt0KgvBGOXqc6CTKtYER2czkxoMBLa3R5ibhS6Mx6FnOqlFrnto1WmiKzLaK2MFWJotELN92ClnJa4DyxOtXRBsEBhrUj6RSEUNyYlMrm-sSi6sEvrl3_U_Bp5s3RdOGs3IHc2T4UMVeJ0XuWhJJZEd1dLD_7-9BTWb0ePD_nD3eD-EDZoZ2t41xGszD7neAxr-ms2rj5PvAl9A8mVxao
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2014+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=Learning+and+Transferring+Mid-level+Image+Representations+Using+Convolutional+Neural+Networks&rft.au=Oquab%2C+Maxime&rft.au=Bottou%2C+Leon&rft.au=Laptev%2C+Ivan&rft.au=Sivic%2C+Josef&rft.date=2014-06-01&rft.pub=IEEE&rft.issn=1063-6919&rft.eissn=1063-6919&rft.spage=1717&rft.epage=1724&rft_id=info:doi/10.1109%2FCVPR.2014.222&rft.externalDocID=6909618
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon