Knowledge distillation: A good teacher is patient and consistent
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empi...
Uložené v:
| Vydané v: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 10915 - 10924 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.06.2022
|
| Predmet: | |
| ISSN: | 1063-6919 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empirical investigation we do not aim to necessarily propose a new method, but strive to identify a robust and effective recipe for making state-of-the-art large scale models affordable in practice. We demonstrate that, when performed correctly, knowledge distillation can be a powerful tool for reducing the size of large models without compromising their performance. In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation. Our key contribution is the explicit identification of these design choices, which were not previously articulated in the literature. We back up our findings by a comprehensive empirical study, demonstrate compelling results on a wide range of vision datasets and, in particular, obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy. |
|---|---|
| AbstractList | There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empirical investigation we do not aim to necessarily propose a new method, but strive to identify a robust and effective recipe for making state-of-the-art large scale models affordable in practice. We demonstrate that, when performed correctly, knowledge distillation can be a powerful tool for reducing the size of large models without compromising their performance. In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation. Our key contribution is the explicit identification of these design choices, which were not previously articulated in the literature. We back up our findings by a comprehensive empirical study, demonstrate compelling results on a wide range of vision datasets and, in particular, obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy. |
| Author | Anil, Rohan Beyer, Lucas Zhai, Xiaohua Royer, Amelie Markeeva, Larisa Kolesnikov, Alexander |
| Author_xml | – sequence: 1 givenname: Lucas surname: Beyer fullname: Beyer, Lucas email: lbeyer@google.com organization: Google Research, Brain Team – sequence: 2 givenname: Xiaohua surname: Zhai fullname: Zhai, Xiaohua email: xzhai@google.com organization: Google Research, Brain Team – sequence: 3 givenname: Amelie surname: Royer fullname: Royer, Amelie organization: Google Research, Brain Team – sequence: 4 givenname: Larisa surname: Markeeva fullname: Markeeva, Larisa organization: Google Research, Brain Team – sequence: 5 givenname: Rohan surname: Anil fullname: Anil, Rohan organization: Google Research, Brain Team – sequence: 6 givenname: Alexander surname: Kolesnikov fullname: Kolesnikov, Alexander email: akolesnikov@google.com organization: Google Research, Brain Team |
| BookMark | eNotj9tKAzEYhKMo2NY-gV7kBXbNYfMn8cqy1AMWFFFvS5r8qZE1WzYL4tu7oFfDzDcMzJyc5D4jIZec1Zwze9W-P78oAcbUgglRM85AHZE5B1AN2AbkMZlNmazAcntGlqV8Msak4BysmZGbx9x_dxj2SEMqY-o6N6Y-X9MV3fd9oCM6_4EDTYUeJoJ5pC4H6vtcpvpkz8lpdF3B5b8uyNvt-rW9rzZPdw_talMlweRYhRC1M1F5r_Uuei8QwKFEwaKOTHEBYSdtYxUIGw0qwSEor4Wy2k5PolyQi7_dhIjbw5C-3PCztUZbxaX8BZgvTAc |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52688.2022.01065 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 1665469463 9781665469463 |
| EISSN | 1063-6919 |
| EndPage | 10924 |
| ExternalDocumentID | 9879513 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i203t-ddf7a8f5cc77bfcc2e66ae3e20f7f05126db39495629f8e5216d5c725979654f3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 138 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000870759104001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:15:10 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-ddf7a8f5cc77bfcc2e66ae3e20f7f05126db39495629f8e5216d5c725979654f3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_9879513 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-June |
| PublicationDateYYYYMMDD | 2022-06-01 |
| PublicationDate_xml | – month: 06 year: 2022 text: 2022-June |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.6383257 |
| Snippet | There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 10915 |
| SubjectTerms | Computational modeling Computer vision Data models Deep learning architectures and techniques; Efficient learning and inferences; Machine learning; Optimization methods; Representation learning Image coding Manifolds Schedules Training |
| Title | Knowledge distillation: A good teacher is patient and consistent |
| URI | https://ieeexplore.ieee.org/document/9879513 |
| WOSCitedRecordID | wos000870759104001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA21ePBUtRW_ycGjadNkN9l4UopFUEoRld5KdjKRguyWduvvd7O7VgQv3kIugQnD5E3evEfI1RAMKLDIOHjLIrSKGQ8R88Jow4Er6yp1_Sc9mSSzmZm2yPV2FgYRK_IZ9sOy-st3OWxCq2xggjN2sKjd0VrVs1rbfooskYwySTMdN-RmMHqbPgcxk0DgEqIfwE_8y0OlKiHjzv8O3ye9n1k8Ot1WmQPSwuyQdJrHI21Sc90lt4_f3THqQt5-1CS3G3pH3_Pc0aKWbqaLNW20VKnNHIVAkC1vOit65HV8_zJ6YI09AlsILgvmnNc28TGA1qkHEKiURYmCe-3LXBPKpdIEACSMT7Cs08rFoEu8o42KIy-PSDvLMzwmVEIqgke6FYhRrGVaFvmh46lGhBLv2BPSDQGZL2sFjHkTi9O_t8_IXoh4Tag6J-1itcELsgufxWK9uqyu7Qsv3JnG |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA2lCnqq2orf5uDRbdPsbrLxpBRLpbUUqdJbyU4mpSC70m79_W5214rgxVvIJTBhmLzJm_cIuemCAgEaPQZWewFq4SkLgWe5kooBE9oU6vojOR5Hs5ma1MjtdhYGEQvyGbbdsvjLNylsXKuso5wztrOo3QmDgLNyWmvbUfFzLCNUVM3HdZnq9N4mL07OxFG4OG87-BP-clEpiki_8b_jD0jrZxqPTrZ15pDUMDkijer5SKvkXDfJ_fC7P0aNy9z3kuZ2Rx_oIk0NzUrxZrpc00pNlerEUHAU2fyuk6xFXvuP097AqwwSvCVnfuYZY6WObAggZWwBOAqh0UfOrLR5tnFhYl85CMSVjTCv1MKEIHPEI5UIA-sfk3qSJnhCqA8xdy7pmiMGofTjvMx3DYslIuSIR5-SpgvI_KPUwJhXsTj7e_ua7A2mz6P56Gk8PCf7LvolveqC1LPVBi_JLnxmy_XqqrjCL2D3nQ0 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Knowledge+distillation%3A+A+good+teacher+is+patient+and+consistent&rft.au=Beyer%2C+Lucas&rft.au=Zhai%2C+Xiaohua&rft.au=Royer%2C+Amelie&rft.au=Markeeva%2C+Larisa&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=10915&rft.epage=10924&rft_id=info:doi/10.1109%2FCVPR52688.2022.01065&rft.externalDocID=9879513 |