What question did this study set out to answer?

The objective is to evaluate and transition programming models to enhance performance and portability across various accelerator platforms.

April 23, 2026Open Access

Interim report on programming models solutions. Deliverable D3.2 of the HORIZON-EUROHPC-JU-2021-COE-01 project MaX (101093374)

Key Points

The objective is to evaluate and transition programming models to enhance performance and portability across various accelerator platforms.
Analyzed transitions from CUDA to OpenMP and OpenACC in QUANTUM ESPRESSO.
Implemented SYCL C++ in BIGDFT for better performance and portability.
Conducted comparative analysis of FFTXlib against other libraries to assess scalability and performance.
QUANTUM ESPRESSO successfully offloaded FFTXlib with OpenMP, improving scalability.
BIGDFT showed key kernel enhancements with SYCL C++, leading to improved performance.
Proposed optimizations for band parallelization demonstrate enhanced strong and weak scalability.

Abstract

This interim report presents the status and achievements of the T3. 2 task within WP3. We first discuss three different strategies for transitioning from CUDA to programming models more suited for achieving functional and performance portability with all families of accelerators. • In QUANTUM ESPRESSO, developers have adopted OpenMP and OpenACC as an alternative back end for offloading, gradually phasing out the previous CUDAFortran implementation. Several proposed interfaces have been adopted for the whole high-level code layer to maintain the source code uniqueness and transparency concerning the two backends. • YAMBO developers present the use of the deviceXlib MAX library to integrate multiple offload back-ends inside large Fortran codes. • In BIGDFT, crucial kernels have been implemented using the SYCL C++ programming model with significant results both in portability and performance. Another part of this report is then dedicated to the work done in T3. 2 on the FFTXlib of QUANTUM ESPRESSO. This latter has been successfully offloaded with the OpenMP backend. To improve the performance and scalability of this porting, T3. 2 has taken charge of implementing a batched version of the library for the Cray/HIP toolchain. To understand the intrinsic scalability limits of this kernel, we are also performing a comparative analysis of FFTXlib performance versus analogous FFT libraries for accelerators (CuFFTMP, heFFTe). We present the work done to design and implement an optimised band parallelization scheme that can efficiently side FFTXlib parallelism in QUANTUM ESPRESSO, overcoming its current limits in strong and weak scalability. We also present the work done by the BIGDFT group as an appendix for streamlining and optimizing the benchmarking process. We conclude with some final remarks and point out two main topics of oncoming activities in T3. 2: data exchange within workflows and the improvement of check-pointing and code resilience.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper