Knowledge distillation: A good teacher is patient and consistent

There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 10915 - 10924
Hlavní autoři: Beyer, Lucas, Zhai, Xiaohua, Royer, Amelie, Markeeva, Larisa, Anil, Rohan, Kolesnikov, Alexander
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.06.2022
Témata:
ISSN:1063-6919
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empirical investigation we do not aim to necessarily propose a new method, but strive to identify a robust and effective recipe for making state-of-the-art large scale models affordable in practice. We demonstrate that, when performed correctly, knowledge distillation can be a powerful tool for reducing the size of large models without compromising their performance. In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation. Our key contribution is the explicit identification of these design choices, which were not previously articulated in the literature. We back up our findings by a comprehensive empirical study, demonstrate compelling results on a wide range of vision datasets and, in particular, obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
AbstractList There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empirical investigation we do not aim to necessarily propose a new method, but strive to identify a robust and effective recipe for making state-of-the-art large scale models affordable in practice. We demonstrate that, when performed correctly, knowledge distillation can be a powerful tool for reducing the size of large models without compromising their performance. In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation. Our key contribution is the explicit identification of these design choices, which were not previously articulated in the literature. We back up our findings by a comprehensive empirical study, demonstrate compelling results on a wide range of vision datasets and, in particular, obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
Author Anil, Rohan
Beyer, Lucas
Zhai, Xiaohua
Royer, Amelie
Markeeva, Larisa
Kolesnikov, Alexander
Author_xml – sequence: 1
  givenname: Lucas
  surname: Beyer
  fullname: Beyer, Lucas
  email: lbeyer@google.com
  organization: Google Research, Brain Team
– sequence: 2
  givenname: Xiaohua
  surname: Zhai
  fullname: Zhai, Xiaohua
  email: xzhai@google.com
  organization: Google Research, Brain Team
– sequence: 3
  givenname: Amelie
  surname: Royer
  fullname: Royer, Amelie
  organization: Google Research, Brain Team
– sequence: 4
  givenname: Larisa
  surname: Markeeva
  fullname: Markeeva, Larisa
  organization: Google Research, Brain Team
– sequence: 5
  givenname: Rohan
  surname: Anil
  fullname: Anil, Rohan
  organization: Google Research, Brain Team
– sequence: 6
  givenname: Alexander
  surname: Kolesnikov
  fullname: Kolesnikov, Alexander
  email: akolesnikov@google.com
  organization: Google Research, Brain Team
BookMark eNotj9tKAzEYhKMo2NY-gV7kBXbNYfMn8cqy1AMWFFFvS5r8qZE1WzYL4tu7oFfDzDcMzJyc5D4jIZec1Zwze9W-P78oAcbUgglRM85AHZE5B1AN2AbkMZlNmazAcntGlqV8Msak4BysmZGbx9x_dxj2SEMqY-o6N6Y-X9MV3fd9oCM6_4EDTYUeJoJ5pC4H6vtcpvpkz8lpdF3B5b8uyNvt-rW9rzZPdw_talMlweRYhRC1M1F5r_Uuei8QwKFEwaKOTHEBYSdtYxUIGw0qwSEor4Wy2k5PolyQi7_dhIjbw5C-3PCztUZbxaX8BZgvTAc
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52688.2022.01065
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1665469463
9781665469463
EISSN 1063-6919
EndPage 10924
ExternalDocumentID 9879513
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-ddf7a8f5cc77bfcc2e66ae3e20f7f05126db39495629f8e5216d5c725979654f3
IEDL.DBID RIE
ISICitedReferencesCount 138
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000870759104001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:15:10 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-ddf7a8f5cc77bfcc2e66ae3e20f7f05126db39495629f8e5216d5c725979654f3
PageCount 10
ParticipantIDs ieee_primary_9879513
PublicationCentury 2000
PublicationDate 2022-June
PublicationDateYYYYMMDD 2022-06-01
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-June
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.6383257
Snippet There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in...
SourceID ieee
SourceType Publisher
StartPage 10915
SubjectTerms Computational modeling
Computer vision
Data models
Deep learning architectures and techniques; Efficient learning and inferences; Machine learning; Optimization methods; Representation learning
Image coding
Manifolds
Schedules
Training
Title Knowledge distillation: A good teacher is patient and consistent
URI https://ieeexplore.ieee.org/document/9879513
WOSCitedRecordID wos000870759104001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEB1q8eCpait-swePpk03zW7Wk1IsglKKaOmtbHZnpSCJtKm_350kRgQv3kIugRlmJ2_2zXsAV8KmjtpKgEbrYKRGLtDogWvsz0uVcou8tAOaP8npNFks1KwF180uDCKW5DPs02N5l29zs6VR2UCRMzZZ1O5IKapdrWaeEnkkI1RSb8cNQzUYz2fPJGZCBC7O-wR-4l8eKmULmXT-9_F96P3s4rFZ02UOoIXZIXTqn0dWl-amC7eP39MxZqlu3yuS2w27Y295bllRSTez1YbVWqpMZ5YZIsj6TGdFD14n9y_jh6C2RwhWPIyKwFondeJiY6RMnTEchdAYIQ-ddL7WuM9DpAgAceUS9H1a2NhIj3ekEvHIRUfQzvIMj4HpoVFkHCaR9PAwTlEaDyx9_qzUwsoT6FJAlh-VAsayjsXp36_PYI8iXhGqzqFdrLd4Abvms1ht1pdl2r4AElaZvw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NS8NAEF1KFfRUtRW_3YNH06abbDbrSSmWSmspUktvZbM7KwVJpE39_e4ksSJ48RZyCcwwO3mzb94j5CYyicW24oFWygtlaD0FDrhyd17KhBlghR3QbCTG43g-l5Maud3uwgBAQT6DNj4Wd_km0xsclXUkOmOjRe0OD0Pml9ta24lK4LBMJONqP67ry05vNnlBOROkcDHWRvjDf7moFE2k3_jf5w9I62cbj062feaQ1CA9Io3q95FWxblukvvh93yMGqzc95Lmdkcf6FuWGZqX4s10uaaVmipVqaEaKbIu12neIq_9x2lv4FUGCd6S-UHuGWOFii3XWojEas0gihQEwHwrrKs25jIRSIRATNoYXKeODNfCIR4hIx7a4JjU0yyFE0JVV0u0DhOAinjAExDaQUuXQSNUZMQpaWJAFh-lBsaiisXZ36-vyd5g-jxajJ7Gw3Oyj9Ev6VUXpJ6vNnBJdvVnvlyvrooUfgFBzZ0G
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Knowledge+distillation%3A+A+good+teacher+is+patient+and+consistent&rft.au=Beyer%2C+Lucas&rft.au=Zhai%2C+Xiaohua&rft.au=Royer%2C+Amelie&rft.au=Markeeva%2C+Larisa&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=10915&rft.epage=10924&rft_id=info:doi/10.1109%2FCVPR52688.2022.01065&rft.externalDocID=9879513