High-performance area-efficient polynomial ring processor for CRYSTALS-Kyber on FPGAs

The quantum-resistant attribute is a new design criterion for cryptography algorithms in the era of quantum supremacy. Lattice-based cryptography is proved to be secure against quantum computing. CRYSTALS-Kyber is a lattice-based promising candidate in the post-quantum cryptography standardization p...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Integration (Amsterdam) Ročník 78; s. 25 - 35
Hlavní autori: Chen, Zhaohui, Ma, Yuan, Chen, Tianyu, Lin, Jingqiang, Jing, Jiwu
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Amsterdam Elsevier B.V 01.05.2021
Elsevier BV
Predmet:
ISSN:0167-9260, 1872-7522
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The quantum-resistant attribute is a new design criterion for cryptography algorithms in the era of quantum supremacy. Lattice-based cryptography is proved to be secure against quantum computing. CRYSTALS-Kyber is a lattice-based promising candidate in the post-quantum cryptography standardization process. This paper proposes a high-performance polynomial ring processor for the CRYSTALS-Kyber algorithm. The processor executes optimized polynomial ring arithmetic, which cuts off over 20%/50% on the times of modular multiplication/addition compared with the straightforward implementations. Besides, the forward and inverse Number Theoretic Transform (NTT) reuse the control logic with the help of an efficient configurable butterfly unit to minimize the area of the finite state machine. Further, the underlying dual-column sequential storage scheme breaks the bottleneck of memory accessing. To evaluate the performance, a fully pipelined architecture is implemented on a low-cost FPGA platform. Benefiting from these optimizations, the Kyber1024processor can perform NTT operation for a 4-dimensional polynomial vector in 17.1 μs, and it achieves speedup by a factor of 2.1 compared with the state-of-the-art implementation. •Kyber1024 processor can perform NTT operation for a 4-dimensional polynomial vector in 17.1μs on a low-cost FPGA.•Saving more than 20% on the times of modular multiplication operations in polynomial ring arithmetic.•Optimized NTT signal flow reuses the loop control logic to save nearly 50% of the resource.•Dual-column sequential storage improves memory bandwidth.•Configurable butterfly unit supports Cooley–Tukey butterfly-based forward NTT, Gentlemen–Sande butterfly-based inverse NTT, and other meta operations.
Bibliografia:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
ISSN:0167-9260
1872-7522
DOI:10.1016/j.vlsi.2020.12.005