Recursive algorithms for computing the Frobenius norm of a real array are proposed, based on Formula: see text, a hypotenuse function. Comparing their relative accuracy bounds with those of the BLAS routine Formula: see text it is shown that the proposed algorithms could in many cases be significantly more accurate. The scalar recursive algorithms are vectorized with the Intel’s vector instructions to achieve performance comparable to Formula: see text, and are further parallelized with OpenCilk. Some scalar algorithms are unconditionally bitwise reproducible, while the reproducibility of the vector ones depends on the vector width. A modification of the proposed algorithms to compute the vector Formula: see text-norm is also presented.
Building similarity graph...
Analyzing shared references across papers
Loading...
Vedran Novaković
Parallel Processing Letters
Croatian Chamber of Economy
Building similarity graph...
Analyzing shared references across papers
Loading...
Vedran Novaković (Thu,) studied this question.
www.synapsesocial.com/papers/699010ce2ccff479cfe5704f — DOI: https://doi.org/10.1142/s0129626426500027