Memristor-Based Circuit Implementation and Circuitry Optimized Algorithm for Mamba Language Network
Language networks are crucial in artificial intelligence, with the novel Mamba architecture significantly reducing computations and consumption compared to the traditional transformer network. However, a full-circuit implementation of the Mamba network has not been proposed due to the complexity of...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on circuits and systems. I, Regular papers S. 1 - 14 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
2025
|
| Schlagworte: | |
| ISSN: | 1549-8328, 1558-0806 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Language networks are crucial in artificial intelligence, with the novel Mamba architecture significantly reducing computations and consumption compared to the traditional transformer network. However, a full-circuit implementation of the Mamba network has not been proposed due to the complexity of computations and data storage. Additionally, optimized hardware-aware parallel algorithms for Mamba inference in circuits remain undeveloped. This work addresses these challenges by presenting a memristor-based full-circuit implementation of the Mamba network and introducing a computing-in-memory parallel-aware algorithm tailored for circuit-level inference. The implementation includes: 1) Standard 1T1M memristor crossbar and depthwise separable convolution memristor crossbar for different convolutions. 2) Computing-in-memory implicit latent state circuits for the computation and transition of latent states. 3) Functional circuits for SiLU activation, RMS normalization, and multi-layer multiply-accumulate operations. 4) Optimized algorithm and circuit implementation for hardware-aware inference, achieving parallel scanning and hardware awareness in circuits. The proposed circuit enables analog signal computations and eliminates redundant analog-to-digital conversions and intermediate storage. A basic single-sentence generation task was simulated in PSPICE, validating the circuit's correctness. Analyses of analog computation accuracy, circuit stability, and power consumption demonstrate the proposed circuit's advantages, highlighting its potential as a fundamental module for large-scale circuit integration and complex text generation tasks. |
|---|---|
| ISSN: | 1549-8328 1558-0806 |
| DOI: | 10.1109/TCSI.2025.3584247 |