Exploiting matrix symmetry to halve memory footprint offers a substantial opportunity for accelerating memory-bound computations like Sparse Matrix-Vector Multiplication (SpMV). However, symmetric SpMV incurs data conflicts when concurrently writing the output vector. Previous approaches fail to address this issue efficiently, i.e., either are non-scalable or yield poor performance for large high-bandwidth irregular matrices. This paper extends DCS-SpMV , a D ivide-and- C onquer (DC) based shared-memory implementation of S ymmetric SpMV. The key idea of DCS-SpMV is to recursively divide and reorder the matrix-induced conflict graph into independent subgraphs for parallel execution, and construct separate subgraphs to avoid data conflicts. The DC approach naturally transforms the input matrix into a low-conflict part and a high-conflict part, which motivates us to design a conflict-aware hybrid solution DCH-SpMV that executes these two parts using DCS-SpMV and the standard SpMV, respectively. We also develop a machine learning model for DCH-SpMV to predict the optimal number of DC recursions on a given matrix and architecture. In this work, we further optimize the hybrid DC implementation by reducing data conflicts before the DC preprocessing. First, we present a conflict-pruning strategy to decouple certain highly dense columns or rows from the conflict graph of a symmetric matrix. Second, we implement a heuristic to adaptively select the lower or upper triangular part of a symmetric matrix, leading to fewer data conflicts. Our optimizations not only facilitate the DC preprocessing, but also improve the performance of DCH-SpMV. We evaluate our work on both x86 and ARM multi-core CPUs using 298 symmetric sparse matrices from the SuiteSparse Matrix Collection. Our new optimizations improve the performance of previous version 42 by up to 4.89 ×, demonstrating significant speedup over the state-of-the-art approaches including the vendor-tuned Intel oneMKL library.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haozhong Qiu
Chuanfu Xu
Qingsong Wang
ACM Transactions on Architecture and Code Optimization
National University of Defense Technology
China Aerodynamics Research and Development Center
Building similarity graph...
Analyzing shared references across papers
Loading...
Qiu et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2c77e4eeef8a2a6b1896 — DOI: https://doi.org/10.1145/3803015