The Parallel FDFM Processor Core Approach for CRT-based RSA Decryption

Yasuaki Ito, Koji Nakano, Song Bo


One of the key points of success in high performance computation using an FPGA is the efficient usage of DSP slices and block RAMs in it. This paper presents a FDFM (Few DSP slices and Few block RAMs) processor core approach for implementing RSA encryption. In our approach, an efficient hardware algorithm for Chinese Remainder Theorem (CRT) based RSA decryption using Montgomery multiplication algorithm is implemented. Our hardware algorithm supporting up-to 2048-bit RSA decryption is designed to be implemented using one DSP slice, one block RAM and few logic blocks in the Xilinx Virtex-6 FPGA. The implementation results show that our RSA core for 1024-bit RSA decryption runs in 13.74ms. Quite surprisingly, the multiplier in the DSP slice used to compute Montgomery multiplication works in more than 95% clock cycles during the processing. Hence, our implementation is close to optimal in the sense that it has only less than 5% overhead in multiplication and no further improvement is possible as long as CRT-based Montgomery multiplication based algorithm is applied. We have also succeeded in implementing 320 RSA decryption cores in one Xilinx Virtex-6 FPGA XC6VLX240T-1 which work in parallel. The implemented parallel 320 RSA cores achieve 23.03 Mbits/s throughput for 1024-bit RSA decryption.


RSA decryption; FPGA; Montgomery modular multiplication; Chinese Remainder Theorem; DSP slice

Full Text:



  • There are currently no refbacks.