Decoupled Knowledge Distillation

State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. To provide a novel viewpoint to study logit distillation, we re-formulate the classical KD loss into two parts, i.e., targe...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 11943 - 11952
Main Authors: Zhao, Borui, Cui, Quan, Song, Renjie, Qiu, Yiyu, Liang, Jiajun
Format: Conference Proceeding
Language:English
Published: IEEE 01.06.2022
Subjects:
ISSN:1063-6919
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. To provide a novel viewpoint to study logit distillation, we re-formulate the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD). We empirically investigate and prove the effects of the two parts: TCKD transfers knowledge concerning the "difficulty" of training samples, while NCKD is the prominent reason why logit distillation works. More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts. To address these issues, we present Decoupled Knowledge Distillation (DKD), enabling TCKD and NCKD to play their roles more efficiently and flexibly. Compared with complex feature-based methods, our DKD achieves comparable or even better results and has better training efficiency on CIFAR-100, ImageNet, and MS-COCO datasets for image classification and object detection tasks. This paper proves the great potential of logit distillation, and we hope it will be helpful for future research. The code is available at https://github.com/megviiresearch/mdistiller.
AbstractList State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. To provide a novel viewpoint to study logit distillation, we re-formulate the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD). We empirically investigate and prove the effects of the two parts: TCKD transfers knowledge concerning the "difficulty" of training samples, while NCKD is the prominent reason why logit distillation works. More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts. To address these issues, we present Decoupled Knowledge Distillation (DKD), enabling TCKD and NCKD to play their roles more efficiently and flexibly. Compared with complex feature-based methods, our DKD achieves comparable or even better results and has better training efficiency on CIFAR-100, ImageNet, and MS-COCO datasets for image classification and object detection tasks. This paper proves the great potential of logit distillation, and we hope it will be helpful for future research. The code is available at https://github.com/megviiresearch/mdistiller.
Author Song, Renjie
Qiu, Yiyu
Zhao, Borui
Liang, Jiajun
Cui, Quan
Author_xml – sequence: 1
  givenname: Borui
  surname: Zhao
  fullname: Zhao, Borui
  email: zhaoborui.gm@gmail.com
  organization: MEGVII Technology
– sequence: 2
  givenname: Quan
  surname: Cui
  fullname: Cui, Quan
  email: cui-quan@toki.waseda.jp
  organization: Waseda University
– sequence: 3
  givenname: Renjie
  surname: Song
  fullname: Song, Renjie
  email: songrenjie@megvii.com
  organization: MEGVII Technology
– sequence: 4
  givenname: Yiyu
  surname: Qiu
  fullname: Qiu, Yiyu
  email: chouyy18@mails.tsinghua.edu.cn
  organization: MEGVII Technology
– sequence: 5
  givenname: Jiajun
  surname: Liang
  fullname: Liang, Jiajun
  email: liangjiajun@megvii.com
  organization: MEGVII Technology
BookMark eNotzMFKw0AQgOFVFGxrn0APeYHEmdnsZvcoqVWxoIh6LbPZqazEpDQR8e0N6Om_fPxzddL1nSh1iVAggr-q356eDVnnCgKiAhCtOVJztNaU1pdWH6sZgtW59ejP1HIYPgBA0-S8m6lsJU3_tW8lZg9d_z31XbJVGsbUtjymvjtXpztuB1n-d6Fe1zcv9V2-eby9r683eSLQYx6wiSZQ6ZuKJZBnw1aE2ZUUIu0CGBei5sowRgk6QgVGN5bANZNA0Qt18fdNIrLdH9InH3623lXeode_ZJ9BKQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52688.2022.01165
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1665469463
9781665469463
EISSN 1063-6919
EndPage 11952
ExternalDocumentID 9879819
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-b1cd5b249c7aeb29a5a6eeaa842bd2fb058bd3a75a1deb3d07053c6208c42b1e3
IEDL.DBID RIE
ISICitedReferencesCount 541
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000870759105004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:15:10 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-b1cd5b249c7aeb29a5a6eeaa842bd2fb058bd3a75a1deb3d07053c6208c42b1e3
PageCount 10
ParticipantIDs ieee_primary_9879819
PublicationCentury 2000
PublicationDate 2022-June
PublicationDateYYYYMMDD 2022-06-01
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-June
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.6783202
Snippet State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is...
SourceID ieee
SourceType Publisher
StartPage 11943
SubjectTerms categorization
Codes
Computer architecture
Computer vision
Deep learning
Deep learning architectures and techniques; Efficient learning and inferences; Recognition: detection
Feature extraction
Object detection
retrieval
Training
Title Decoupled Knowledge Distillation
URI https://ieeexplore.ieee.org/document/9879819
WOSCitedRecordID wos000870759105004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5q8eCpaiu-ycGjaZNNso9zaxGEEkSltzL7EAqlLW3i7-9sEiOCF2_LMLDsDrMz3-w8AB40uQxcOwxRMxumUtEqIr1KjeIi0haNxWrYhJjN5Hyu8g48trUwzrkq-cwN_bL6y7cbU_pQ2YjwsZK-x-eREKKu1WrjKQkhGa5kUx0XR2o0_shffTMTn8DF2LBqNPNrhkplQqa9_21-CoOfWrwgb63MGXTc-hx6jfMYNKq570MwISBZbldEfPmOkwUTr8GrOt1tAO_Tp7fxc9iMPwiXLEqKUMfGZprgkRFI-Fdhhtw5RJkybdmnjjKpbYIiw9gSJLakvFliOIukIY7YJRfQXW_W7hICkZJMHE-lIQBCr7NW3KDgipgZJphdQd8feLGtO1wsmrNe_02-gRN_o3XC1C10i13p7uDYfBXL_e6-EssBOHKOlw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dSwJBEB_Egnqy0ui7e-ix0729vb3dZ00MTSQsfJP9CgRR0bO_v9n1MoJeeluGgWUYZmd-s_MB8KAxZODaqVhpamMmJJ4I2hUzkudEW2WsCssm8uFQTCZyVIHHfS-Mcy4Un7mmP4a_fLs0W58qayE-lsLP-DzIGKPJrltrn1FJEctwKcr-uITIVvt99OrHmfgSLkqbYdTMry0qwYl0a_-7_gQaP9140WjvZ06h4hZnUCvDx6g0zk0dog5Cye1qjsT-d6Ys6ngbnu8K3hrw1n0at3txuQAhnlGSFrFOjM00AiSTK0TAUmWKO6eUYFRb-qFJJrRNVZ6pxCIotmi-WWo4JcIgR-LSc6gulgt3AVHOUCuOM2EQguD7rCU3KucSmalKVXYJdS_wdLWbcTEtZb36m3wPR73xy2A6eB72r-EYg4iQlqDkBqrFeutu4dB8FrPN-i6o6AvXcZHd
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Decoupled+Knowledge+Distillation&rft.au=Zhao%2C+Borui&rft.au=Cui%2C+Quan&rft.au=Song%2C+Renjie&rft.au=Qiu%2C+Yiyu&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=11943&rft.epage=11952&rft_id=info:doi/10.1109%2FCVPR52688.2022.01165&rft.externalDocID=9879819