Large Earth science simulations are increasing their spatial resolution from hundreds of kilometers to a kilometer scale, making memory capacity and memory performance critical constraints. In many large-scale applications, excessive memory usage can result in the need to undersubscribe nodes, thereby increasing the overall job resource requirements. In this work, we explore the use of several tools to diagnose memory leaks and excessive memory usage in large simulation workflows. While they provide detailed allocation traces, they are often difficult to apply to full production runs and can introduce substantial overhead. As a result, they may fail to capture memory behavior at realistic scales. To address this limitation, we investigate a complementary approach based on recording virtual memory usage, including peak virtual memory and memory growth over time across all MPI ranks. These lightweight techniques allow us to analyze memory behavior throughout the entire workflow with minimal perturbation. Together, these tools provide practical insight into where and how memory is consumed.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haiying Xu
Dennis John
NSF National Center for Atmospheric Research
Building similarity graph...
Analyzing shared references across papers
Loading...
Xu et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69d8968f6c1944d70ce0806c — DOI: https://doi.org/10.5281/zenodo.19473876