Low-Latency Parallel Row-Layered Min-Sum Decoders with Scheduling Optimization for MDPC Code-Based Post-Quantum Cryptography

The medium-density parity-check (MDPC) code-based cryptosystem remains a finalist of the post-quantum cryptography standard. The row-layered Min-Sum decoding achieves an efficient trade-off between performance and complexity. Parallel row-layered MDPC decoders are designed in previous studies by enf...

Full description

Saved in:
Bibliographic Details
Published in:Journal of signal processing systems Vol. 97; no. 5-6; pp. 257 - 268
Main Authors: Cai, Jiaxuan, Zhang, Xinmiao
Format: Journal Article
Language:English
Published: New York Springer US 01.12.2025
Springer Nature B.V
Subjects:
ISSN:1939-8018, 1939-8115
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The medium-density parity-check (MDPC) code-based cryptosystem remains a finalist of the post-quantum cryptography standard. The row-layered Min-Sum decoding achieves an efficient trade-off between performance and complexity. Parallel row-layered MDPC decoders are designed in previous studies by enforcing constraints on the parity-check matrix. The preliminary work introduces two schemes to reduce the latency without increasing the constraints on the parity-check matrix, which may bring security concerns. The first scheme simultaneously processes multiple identity blocks in the parity-check matrix, and the second processes a larger block of variable width each time. However, their speedup is limited by data access conflicts. In this paper, optimizations are proposed for the computation scheduling of each scheme to further reduce the latency. Out-of-order processing of identity blocks is proposed for the first scheme to reduce memory access conflicts, enabling more simultaneous processing of multiple blocks. For the second scheme, out-of-order processing of block segments is explored to update more messages simultaneously without causing conflicts in layered decoding. Efficient hardware architectures have also been designed for both of the proposed scheduling optimizations. For an example code, the two proposed optimizations achieve 35% and 9%, respectively, speedup over the best prior designs with negligible area overhead. The out-of-order processing is more effective on the first design.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1939-8018
1939-8115
DOI:10.1007/s11265-025-01964-9