07. Optimizing Turbomachinery CFD applications for Modern Multi-Core and Accelerator HPC Systems
Event Type
Poster
LocationLower Lobby Concourse
DescriptionMBFLO3 is a general-purpose, multi-disciplinary simulation code focused on turbomachinery applications including turbulent conjugate heat transfer. The multi-block, structured mesh algorithm has been refactored to perform efficiently on multicore CPUs, NVIDIA Kepler GPUs, and Intel Xeon Phi accelerators by extending the existing distributed memory (MPI) parallelism with finer-grained thread and data parallelism. Thread and data parallelism was implemented using OpenACC and OpenMP compiler directives. Significant memory and loop restructuring was necessary to achieve high performance on all three platforms. Refactoring lead to improvements of over 2x on the host CPU and more than 4x on the Xeon Phi accelerator. OpenMP threading on the multi-core host CPU increased the parallelism by an order of magnitude with over 60% parallel efficiency.
Archive








