Low-Latency Parallel Row-Layered Min-Sum Decoders with Scheduling Optimization for MDPC Code-Based Post-Quantum Cryptography
The medium-density parity-check (MDPC) code-based cryptosystem remains a finalist of the post-quantum cryptography standard. The row-layered Min-Sum decoding achieves an efficient trade-off between performance and complexity. Parallel row-layered MDPC decoders are designed in previous studies by enf...
Uloženo v:
| Vydáno v: | Journal of signal processing systems Ročník 97; číslo 5-6; s. 257 - 268 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
Springer US
01.12.2025
Springer Nature B.V |
| Témata: | |
| ISSN: | 1939-8018, 1939-8115 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The medium-density parity-check (MDPC) code-based cryptosystem remains a finalist of the post-quantum cryptography standard. The row-layered Min-Sum decoding achieves an efficient trade-off between performance and complexity. Parallel row-layered MDPC decoders are designed in previous studies by enforcing constraints on the parity-check matrix. The preliminary work introduces two schemes to reduce the latency without increasing the constraints on the parity-check matrix, which may bring security concerns. The first scheme simultaneously processes multiple identity blocks in the parity-check matrix, and the second processes a larger block of variable width each time. However, their speedup is limited by data access conflicts. In this paper, optimizations are proposed for the computation scheduling of each scheme to further reduce the latency. Out-of-order processing of identity blocks is proposed for the first scheme to reduce memory access conflicts, enabling more simultaneous processing of multiple blocks. For the second scheme, out-of-order processing of block segments is explored to update more messages simultaneously without causing conflicts in layered decoding. Efficient hardware architectures have also been designed for both of the proposed scheduling optimizations. For an example code, the two proposed optimizations achieve 35% and 9%, respectively, speedup over the best prior designs with negligible area overhead. The out-of-order processing is more effective on the first design. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1939-8018 1939-8115 |
| DOI: | 10.1007/s11265-025-01964-9 |