The increasing need for post-quantum security has driven significant research into efficient implementations of lattice-based cryptography, particularly for resource-constrained embedded devices. While hardware accelerators can achieve high throughput, they often sacrifice flexibility and complicate software development, whereas software-only implementations struggle to meet the performance demands of real-time cryptographic workloads. However, few studies focus on hardware/software co-design approaches. In this paper, we present Kyress, a resource-balanced, secure, and scalable CRYSTALS-Kyber cryptosystem designed for low-cost embedded platforms. We implement three execution configurations, software-only experiments on a scalar processor, a combination of a scalar processor and a vector co-processor, and full hardware/software co-design on an FPGA platform as well as simulator for the evaluations. Experimental results show that Kyress achieves up to a 6\, \, -\, 9. 76\, speedup over state-of-the-art software implementations and delivers competitive performance compared with existing hardware/software accelerators, while requiring significantly fewer hardware resources.
Tran-Hoang et al. (Tue,) studied this question.