07. Optimizing Turbomachinery CFD applications for Modern Multi-Core and Accelerator HPC Systems
Authors: Christopher P. Stone (Computational Science and Engineering LLC)Daryl Y. Lee (University of California, Davis)Roger L. Davis (University of California, Davis)
Abstract: MBFLO3 is a general-purpose, multi-disciplinary simulation code focused on turbomachinery applications including turbulent conjugate heat transfer. The multi-block, structured mesh algorithm has been refactored to perform efficiently on multicore CPUs, NVIDIA Kepler GPUs, and Intel Xeon Phi accelerators by extending the existing distributed memory (MPI) parallelism with finer-grained thread and data parallelism. Thread and data parallelism was implemented using OpenACC and OpenMP compiler directives. Significant memory and loop restructuring was necessary to achieve high performance on all three platforms. Refactoring lead to improvements of over 2x on the host CPU and more than 4x on the Xeon Phi accelerator. OpenMP threading on the multi-core host CPU increased the parallelism by an order of magnitude with over 60% parallel efficiency.
Two-page extended abstract: pdf