Design of Radix-8 Unsigned Bit Pair Recoding Algorithm-Based Floating-Point Multiplier for Neural Network Computations

The neural network computations for Artificial Intelligence (AI) applications demand high speed, low power and area-efficient Floating-Point (FP) multiplication. In this work, we propose an efficient unsigned Bit Pair Recoding (BPR) algorithm for area, power, and speed improved FP unsigned mantissa...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE access Ročník 13; s. 63969 - 63980
Hlavní autoři:	Nesam, J. Jean Jenifer, Ganesh, S. Sankar
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Adders Algorithms Area efficient multiplier Artificial intelligence Artificial neural networks bit pair recoding (BPR) algorithm booth encoding algorithm Coding Complexity theory Decoding Delays Encoding error analysis fast multiplication Floating point arithmetic Image coding low power multiplier Multiplication Multipliers multipliers for neural networks Neural networks partial product reduction Power management Resists
ISSN:	2169-3536, 2169-3536
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	The neural network computations for Artificial Intelligence (AI) applications demand high speed, low power and area-efficient Floating-Point (FP) multiplication. In this work, we propose an efficient unsigned Bit Pair Recoding (BPR) algorithm for area, power, and speed improved FP unsigned mantissa multiplication. The partial product rows are reduced from n to <inline-formula> <tex-math notation="LaTeX">\frac {n}{4} </tex-math></inline-formula> for <inline-formula> <tex-math notation="LaTeX">n\times n </tex-math></inline-formula> binary multiplier using the BPR algorithm with parallel processed partial product reduction. The new algorithm performs partial product row reduction without the 2's complement, Negative Encoding (NE), and Sign Extension (SE) are required for Booth recoded-based multiplication but these computations are not required for floating point unsigned multiplication. The computational cost of determining a 2's complementary circuit and neglecting the sign bit extension of each partial product row in the Modified Booth Encoding (MBE) algorithm is effectively eliminated by BPR algorithm. The unsigned mantissa multiplication using partial product array reduction with the BPR technique uses 27.5% less area, 18% less power, and 33.33% improved speed for generating one partial product row than conventional Booth multipliers. The <inline-formula> <tex-math notation="LaTeX">8\times 8 </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">16\times 16 </tex-math></inline-formula> multipliers are used to verify the BPR binary multipliers on TSMC 65nm 1.1 V CMOS standard cell library and the synthesis reports are compared with the conventional and best-reported improved Booth multipliers. Finally, the MAC design uses 16-bit FP arithmetic with an <inline-formula> <tex-math notation="LaTeX">8\times 8 </tex-math></inline-formula> mantissa multiplier for the CNN accelerator is developed, and it is validated with suitable error metrics like Mean Relative Error (MRE) to assess the suggested architecture for AI applications.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3559226