Design of Radix-8 Unsigned Bit Pair Recoding Algorithm-Based Floating-Point Multiplier for Neural Network Computations
The neural network computations for Artificial Intelligence (AI) applications demand high speed, low power and area-efficient Floating-Point (FP) multiplication. In this work, we propose an efficient unsigned Bit Pair Recoding (BPR) algorithm for area, power, and speed improved FP unsigned mantissa...
Saved in:
| Published in: | IEEE access Vol. 13; pp. 63969 - 63980 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Piscataway
IEEE
2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2169-3536, 2169-3536 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The neural network computations for Artificial Intelligence (AI) applications demand high speed, low power and area-efficient Floating-Point (FP) multiplication. In this work, we propose an efficient unsigned Bit Pair Recoding (BPR) algorithm for area, power, and speed improved FP unsigned mantissa multiplication. The partial product rows are reduced from n to <inline-formula> <tex-math notation="LaTeX">\frac {n}{4} </tex-math></inline-formula> for <inline-formula> <tex-math notation="LaTeX">n\times n </tex-math></inline-formula> binary multiplier using the BPR algorithm with parallel processed partial product reduction. The new algorithm performs partial product row reduction without the 2's complement, Negative Encoding (NE), and Sign Extension (SE) are required for Booth recoded-based multiplication but these computations are not required for floating point unsigned multiplication. The computational cost of determining a 2's complementary circuit and neglecting the sign bit extension of each partial product row in the Modified Booth Encoding (MBE) algorithm is effectively eliminated by BPR algorithm. The unsigned mantissa multiplication using partial product array reduction with the BPR technique uses 27.5% less area, 18% less power, and 33.33% improved speed for generating one partial product row than conventional Booth multipliers. The <inline-formula> <tex-math notation="LaTeX">8\times 8 </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">16\times 16 </tex-math></inline-formula> multipliers are used to verify the BPR binary multipliers on TSMC 65nm 1.1 V CMOS standard cell library and the synthesis reports are compared with the conventional and best-reported improved Booth multipliers. Finally, the MAC design uses 16-bit FP arithmetic with an <inline-formula> <tex-math notation="LaTeX">8\times 8 </tex-math></inline-formula> mantissa multiplier for the CNN accelerator is developed, and it is validated with suitable error metrics like Mean Relative Error (MRE) to assess the suggested architecture for AI applications. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2169-3536 2169-3536 |
| DOI: | 10.1109/ACCESS.2025.3559226 |