Resource-constrained wearable systems often need to be able to execute signal processing and AI workloads. There are many trade-offs to consider for this type of application. This paper presents a lightweight convolution-aware soft processor for embedded signal-processing on resource-constrained wearable devices. This architecture represents a middle ground for signal-processing applications between dedicated accelerators and lightweight soft processors. The proposed architecture integrates a two-lane SIMD integer datapath with a split-stage IEEE-754 floating-point accumulation pipeline. The split-stage design enables overlap between multiplication, accumulation, and operand fetch, improving arithmetic utilization while maintaining low resource costs. The processor was implemented on the Artix-7-based Basys3 platform and evaluated using one-dimensional convolution workloads. The experimental results demonstrate a 6× speedup over MicroBlaze-class soft processors while maintaining the same static power usage (0.073 W), and only requiring 44% higher dynamic power consumption. The architecture achieves this with significantly fewer FPGA resources than accelerator-based solutions such as DPU overlays. The proposed architecture provides a practical alternative for wearable and resource-constrained FPGA systems requiring deterministic convolution performance, demonstrating a balanced design point for embedded wearable platforms where software-defined flexibility and convolution acceleration are both required.
Diaz et al. (Mon,) studied this question.