Design of Radix-8 Unsigned Bit Pair Recoding Algorithm-Based Floating-Point Multiplier for Neural Network Computations
The neural network computations for Artificial Intelligence (AI) applications demand high speed, low power and area-efficient Floating-Point (FP) multiplication. In this work, we propose an efficient unsigned Bit Pair Recoding (BPR) algorithm for area, power, and speed improved FP unsigned mantissa...
Uloženo v:
| Vydáno v: | IEEE access Ročník 13; s. 63969 - 63980 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Piscataway
IEEE
2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 2169-3536, 2169-3536 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The neural network computations for Artificial Intelligence (AI) applications demand high speed, low power and area-efficient Floating-Point (FP) multiplication. In this work, we propose an efficient unsigned Bit Pair Recoding (BPR) algorithm for area, power, and speed improved FP unsigned mantissa multiplication. The partial product rows are reduced from n to <inline-formula> <tex-math notation="LaTeX">\frac {n}{4} </tex-math></inline-formula> for <inline-formula> <tex-math notation="LaTeX">n\times n </tex-math></inline-formula> binary multiplier using the BPR algorithm with parallel processed partial product reduction. The new algorithm performs partial product row reduction without the 2's complement, Negative Encoding (NE), and Sign Extension (SE) are required for Booth recoded-based multiplication but these computations are not required for floating point unsigned multiplication. The computational cost of determining a 2's complementary circuit and neglecting the sign bit extension of each partial product row in the Modified Booth Encoding (MBE) algorithm is effectively eliminated by BPR algorithm. The unsigned mantissa multiplication using partial product array reduction with the BPR technique uses 27.5% less area, 18% less power, and 33.33% improved speed for generating one partial product row than conventional Booth multipliers. The <inline-formula> <tex-math notation="LaTeX">8\times 8 </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">16\times 16 </tex-math></inline-formula> multipliers are used to verify the BPR binary multipliers on TSMC 65nm 1.1 V CMOS standard cell library and the synthesis reports are compared with the conventional and best-reported improved Booth multipliers. Finally, the MAC design uses 16-bit FP arithmetic with an <inline-formula> <tex-math notation="LaTeX">8\times 8 </tex-math></inline-formula> mantissa multiplier for the CNN accelerator is developed, and it is validated with suitable error metrics like Mean Relative Error (MRE) to assess the suggested architecture for AI applications. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2169-3536 2169-3536 |
| DOI: | 10.1109/ACCESS.2025.3559226 |