Knowledge Distillation Based on Instance Spectral Relations

Uložené v:
Podrobná bibliografia
Názov: Knowledge Distillation Based on Instance Spectral Relations
Autori: ZHANG Zhengxiu, ZHOU Chun, YANG Meng
Zdroj: Jisuanji gongcheng, Vol 51, Iss 11, Pp 63-71 (2025)
Informácie o vydavateľovi: Editorial Office of Computer Engineering, 2025.
Rok vydania: 2025
Zbierka: LCC:Computer engineering. Computer hardware
LCC:Computer software
Predmety: knowledge distillation (kd), attention transfer, instance spectral relation, spectral graph structure, manifold learning, Computer engineering. Computer hardware, TK7885-7895, Computer software, QA76.75-76.765
Popis: The core challenge of Knowledge Distillation (KD) lies in extracting generic and sufficient knowledge from the Teacher model to effectively guide the learning of the Student model. Recent studies have found that building upon learning soft labels, further exploration of inter-instance relations in the deep feature space contributes to enhancing the performance of Student models. Existing inter-instance relation-based KD methods widely adopt global Euclidean distance metrics to measure the affinity between instances. However, these methods overlook the intrinsic high-dimensional embedding characteristics of the deep feature space, where data is distributed on a low-dimensional manifold, exhibiting locally Euclidean-like structures but with complex global structures. To address this issue, a novel instance spectrum relation-based KD method is proposed. This strategy eliminates the limitations of the global Euclidean distance and instead constructs and analyzes similarity matrices between each instance and its k-nearest neighbor in the Teacher model's feature space to reveal potential spectral graph structure information. An innovative loss function is designed to guide the Student model to learn not only the probability distribution output by the Teacher model but also simulate the inter-instance relation represented by this spectral graph structure. The experimental results demonstrate that the proposed method significantly improves the performance of the Student model, with an average classification accuracy improvement of 2.33 percentage points compared with baseline methods. These findings strongly indicate the importance and effectiveness of incorporating the spectral graph structure relation between samples in the KD process.
Druh dokumentu: article
Popis súboru: electronic resource
Jazyk: English
Chinese
ISSN: 1000-3428
Relation: https://www.ecice06.com/fileup/1000-3428/PDF/jsjgc-51-11-63.pdf; https://doaj.org/toc/1000-3428
DOI: 10.19678/j.issn.1000-3428.0069690
Prístupová URL adresa: https://doaj.org/article/e13616f3c20d423f81f47ae2c1a1fa4d
Prístupové číslo: edsdoj.13616f3c20d423f81f47ae2c1a1fa4d
Databáza: Directory of Open Access Journals
Popis
Abstrakt:The core challenge of Knowledge Distillation (KD) lies in extracting generic and sufficient knowledge from the Teacher model to effectively guide the learning of the Student model. Recent studies have found that building upon learning soft labels, further exploration of inter-instance relations in the deep feature space contributes to enhancing the performance of Student models. Existing inter-instance relation-based KD methods widely adopt global Euclidean distance metrics to measure the affinity between instances. However, these methods overlook the intrinsic high-dimensional embedding characteristics of the deep feature space, where data is distributed on a low-dimensional manifold, exhibiting locally Euclidean-like structures but with complex global structures. To address this issue, a novel instance spectrum relation-based KD method is proposed. This strategy eliminates the limitations of the global Euclidean distance and instead constructs and analyzes similarity matrices between each instance and its k-nearest neighbor in the Teacher model's feature space to reveal potential spectral graph structure information. An innovative loss function is designed to guide the Student model to learn not only the probability distribution output by the Teacher model but also simulate the inter-instance relation represented by this spectral graph structure. The experimental results demonstrate that the proposed method significantly improves the performance of the Student model, with an average classification accuracy improvement of 2.33 percentage points compared with baseline methods. These findings strongly indicate the importance and effectiveness of incorporating the spectral graph structure relation between samples in the KD process.
ISSN:10003428
DOI:10.19678/j.issn.1000-3428.0069690