Abstract Efficient use of resources of FPGA-based system-on-modules (SoMs) is critical for deploying deep neural networks at the edge. This work quantifies the impact of software multithreading on the AMD Kria KV260, built around a Zynq UltraScale+ MPSoC with a Quad-Core Cortex-A53 and a DPU accelerator, on an image classification task. Three image classification models (MobileNetV2, ResNet-50, and SqueezeNet) were benchmarked under identical conditions, while varying the number of threads for each test. Each thread drives an independent Vitis-AI runner instance. The accuracies of the floating point and quantized models were recorded on a host PC, and the KV260 inference throughput was evaluated on a subset of 500 images from the ImageNet dataset. Thread concurrency delivered a throughput gain of approximately 3. 1 × to 3. 67 × across the three models, up to an optimal threshold of four threads without degrading the models’ Top-1 accuracy. Results provide board-specific evidence that lightweight software multithreading can unlock a significant portion of the KV260 performance.
Building similarity graph...
Analyzing shared references across papers
Loading...
Claudino Costa
José Henrique Brito
Journal of Real-Time Image Processing
Polytechnic Institute of Cávado and Ave
Building similarity graph...
Analyzing shared references across papers
Loading...
Costa et al. (Wed,) studied this question.
synapsesocial.com/papers/69e7138bcb99343efc98d0ac — DOI: https://doi.org/10.1007/s11554-026-01889-x
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: