DP-Nets: Dynamic programming assisted quantization schemes for DNN compression and acceleration
In this work, we present effective quantization schemes called DP-Nets for the compression and acceleration of deep neural networks (DNNs). A key ingredient is a novel dynamic programming (DP) based algorithm to obtain the optimal solution of scalar K-means clustering. Based on the approaches with r...
Uloženo v:
| Vydáno v: | Integration (Amsterdam) Ročník 82; s. 147 - 154 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.01.2022
|
| Témata: | |
| ISSN: | 0167-9260, 1872-7522 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | In this work, we present effective quantization schemes called DP-Nets for the compression and acceleration of deep neural networks (DNNs). A key ingredient is a novel dynamic programming (DP) based algorithm to obtain the optimal solution of scalar K-means clustering. Based on the approaches with regularization and quantization function, two weight quantization approaches called DPR and DPQ for compressing normal DNNs are proposed respectively. Accordingly, a technique based on DP-Nets for inference acceleration is presented. Experiments show that DP-Nets produce models with higher inference accuracy than recently proposed counterparts while achieving same or larger compression. They are also extended for compressing robust DNNs, and the relevant experiments show 16X compression of the robust ResNet-18 model with less than 3% accuracy drop on both natural and adversarial examples. The experiments with FPGA demonstrate that the technique for inference acceleration brings over 5X speedup on matrix–vector multiplication.
•A dynamic programming (DP) method for the scalar K-means problem is proposed.•Two DP-based methods are proposed for DNN compression and acceleration.•The two methods are extended for compressing robust DNNs.•The technique for inference acceleration with compressed DNNs is validated on FPGA. |
|---|---|
| ISSN: | 0167-9260 1872-7522 |
| DOI: | 10.1016/j.vlsi.2021.10.002 |