Scalable Multi-FPGA HPC Architecture for Associative Memory System
Saved in:
| Title: | Scalable Multi-FPGA HPC Architecture for Associative Memory System |
|---|---|
| Authors: | Wang, Deyu, Yan, Xiaoze, Yu, Yang, Stathis, Dimitrios, Hemani, Ahmed, 1961, Lansner, Anders, Professor, 1949, Xu, Jiawei, Zheng, Li-Rong, Zou, Zhuo |
| Source: | IEEE Transactions on Biomedical Circuits and Systems. 19(2):454-468 |
| Subject Terms: | multi-FPGA, scalability, Associative memory, high performance computing (HPC), spiking neural network (SNN), Bayesian confidence propa-gation neural network (BCPNN) |
| Description: | Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200x10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72x and a power reduction exceeding 5.28x under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task. |
| File Description: | |
| Access URL: | https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-363869 |
| Database: | SwePub |
| Abstract: | Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200x10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72x and a power reduction exceeding 5.28x under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task. |
|---|---|
| ISSN: | 19324545 19409990 |
| DOI: | 10.1109/TBCAS.2024.3446660 |
Full Text Finder
Nájsť tento článok vo Web of Science