Hardware Implementation of Approximate Fixed-point Divider for Machine Learning Optimization Algorithm

Division operation is necessary for many applications, especially optimization algorithms for machine learning. Usually, a certain degree of loss is acceptable in calculating nonsignificant intermediate variables for a considerable speed improvement. This paper proposes a specialized divider to acce...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics (Online) S. 22 - 25
Hauptverfasser: Han, Gandong, Zhang, Weiyi, Niu, Liting, Zhang, Chun, Wang, Zhihua, Wang, Ziqiang
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 11.11.2022
Schlagworte:
ISSN:2159-2160
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Division operation is necessary for many applications, especially optimization algorithms for machine learning. Usually, a certain degree of loss is acceptable in calculating nonsignificant intermediate variables for a considerable speed improvement. This paper proposes a specialized divider to accelerate machine learning optimization algorithm implementation on hardware. Inspired by the fast inverse square root algorithm, we designed a hardware implementation method according to the algorithm, which generates an approximate division result with conversion between floating-point and fixed-point numbers and multiplication. This paper includes three versions of divider: fastDiv_accuracy, a conventional design with a 35% less delay and minimal error compared to delay-minimized standard divider from the Synopsys DesignWare library; fastDiv_area, an area-oriented design with a 67% less delay and acceptable error compared to the standard divider constrained to the same area size; fastDiv_speed, the fastest design with a 54% less delay compared to delay-minimized standard divider. All these three versions can be applied in deploying optimization algorithms in FPGA or ASIC design on demand.
ISSN:2159-2160
DOI:10.1109/PrimeAsia56064.2022.10104001