Design of Radix-8 Unsigned Bit Pair Recoding Algorithm-Based Floating-Point Multiplier for Neural Network Computations

The neural network computations for Artificial Intelligence (AI) applications demand high speed, low power and area-efficient Floating-Point (FP) multiplication. In this work, we propose an efficient unsigned Bit Pair Recoding (BPR) algorithm for area, power, and speed improved FP unsigned mantissa...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access Vol. 13; pp. 63969 - 63980
Main Authors:	Nesam, J. Jean Jenifer, Ganesh, S. Sankar
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Adders Algorithms Area efficient multiplier Artificial intelligence Artificial neural networks bit pair recoding (BPR) algorithm booth encoding algorithm Coding Complexity theory Decoding Delays Encoding error analysis fast multiplication Floating point arithmetic Image coding low power multiplier Multiplication Multipliers multipliers for neural networks Neural networks partial product reduction Power management Resists
ISSN:	2169-3536, 2169-3536
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The neural network computations for Artificial Intelligence (AI) applications demand high speed, low power and area-efficient Floating-Point (FP) multiplication. In this work, we propose an efficient unsigned Bit Pair Recoding (BPR) algorithm for area, power, and speed improved FP unsigned mantissa multiplication. The partial product rows are reduced from n to <inline-formula> <tex-math notation="LaTeX">\frac {n}{4} </tex-math></inline-formula> for <inline-formula> <tex-math notation="LaTeX">n\times n </tex-math></inline-formula> binary multiplier using the BPR algorithm with parallel processed partial product reduction. The new algorithm performs partial product row reduction without the 2's complement, Negative Encoding (NE), and Sign Extension (SE) are required for Booth recoded-based multiplication but these computations are not required for floating point unsigned multiplication. The computational cost of determining a 2's complementary circuit and neglecting the sign bit extension of each partial product row in the Modified Booth Encoding (MBE) algorithm is effectively eliminated by BPR algorithm. The unsigned mantissa multiplication using partial product array reduction with the BPR technique uses 27.5% less area, 18% less power, and 33.33% improved speed for generating one partial product row than conventional Booth multipliers. The <inline-formula> <tex-math notation="LaTeX">8\times 8 </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">16\times 16 </tex-math></inline-formula> multipliers are used to verify the BPR binary multipliers on TSMC 65nm 1.1 V CMOS standard cell library and the synthesis reports are compared with the conventional and best-reported improved Booth multipliers. Finally, the MAC design uses 16-bit FP arithmetic with an <inline-formula> <tex-math notation="LaTeX">8\times 8 </tex-math></inline-formula> mantissa multiplier for the CNN accelerator is developed, and it is validated with suitable error metrics like Mean Relative Error (MRE) to assess the suggested architecture for AI applications.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3559226