Key points are not available for this paper at this time.
Abstract We address the efficient implementation of the convolution operator on the GAP8 parallel ultra-low power platform (PULP), a heterogeneous multi-core processor equipped with a fabric controller (FC); a cluster of eight compute cores; and a four-level memory hierarchy with scratchpads instead of conventional, hardware-assisted cache memories. Our solution for this platform transforms the convolution into a general matrix–matrix multiplication ( gemm ) via the lowering approach, demonstrating that it is possible to attain reasonable performance on the GAP8 by carefully adapting techniques such as tiling and loop parallelism, which are mainstream in the multi-threaded, cache-aware realization of gemm .
Building similarity graph...
Analyzing shared references across papers
Loading...
Ramírez et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e78968b6db6435876fbe3c — DOI: https://doi.org/10.1007/s11227-024-05927-y
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Cristián Ramírez
Adrián Castelló
Héctor Martínez
The Journal of Supercomputing
Universitat Politècnica de València
University of Córdoba
Building similarity graph...
Analyzing shared references across papers
Loading...