Hardware Implementation of Approximate Fixed-point Divider for Machine Learning Optimization Algorithm

Division operation is necessary for many applications, especially optimization algorithms for machine learning. Usually, a certain degree of loss is acceptable in calculating nonsignificant intermediate variables for a considerable speed improvement. This paper proposes a specialized divider to acce...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics (Online) S. 22 - 25
Hauptverfasser:	Han, Gandong, Zhang, Weiyi, Niu, Liting, Zhang, Chun, Wang, Zhihua, Wang, Ziqiang
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 11.11.2022
Schlagworte:	Approximation algorithms Delays fast square root fixed-point division Hardware hardware acceleration Libraries Machine learning Machine learning algorithms optimization algorithm Throughput
ISSN:	2159-2160
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Division operation is necessary for many applications, especially optimization algorithms for machine learning. Usually, a certain degree of loss is acceptable in calculating nonsignificant intermediate variables for a considerable speed improvement. This paper proposes a specialized divider to accelerate machine learning optimization algorithm implementation on hardware. Inspired by the fast inverse square root algorithm, we designed a hardware implementation method according to the algorithm, which generates an approximate division result with conversion between floating-point and fixed-point numbers and multiplication. This paper includes three versions of divider: fastDiv_accuracy, a conventional design with a 35% less delay and minimal error compared to delay-minimized standard divider from the Synopsys DesignWare library; fastDiv_area, an area-oriented design with a 67% less delay and acceptable error compared to the standard divider constrained to the same area size; fastDiv_speed, the fastest design with a 54% less delay compared to delay-minimized standard divider. All these three versions can be applied in deploying optimization algorithms in FPGA or ASIC design on demand.
ISSN:	2159-2160
DOI:	10.1109/PrimeAsia56064.2022.10104001