73. Tapas: An Implicitly Parallel Programming Framework For Hierarchical N-Body Algorithms
Authors: Keisuke Fukuda (Tokyo Institute of Technology)Motohiko Matsuda (RIKEN)Naoya Maruyama (RIKEN)Rio Yokota (Tokyo Institute of Technology)Kenjiro Taura (University of Tokyo)Satoshi Matsuoka (Tokyo Institute of Technology)
Abstract: Tapas is our new C++ programming framework for hierarchical algorithms such as n-body, on large scale heterogeneous supercomputers. Encapsulating the algorithms' complexities in a library or a framework has been challenging due to irregular data access over distributed memory. Tapas solves this by converting the user’s implicit-style parallel code into an inspector-executor style program solely by the use of C++ template metaprogramming. A prototype implementation of the Fast Multipole Method on Tapas demonstrates a comparable performance and scaling as ExaFMM, the fastest hand-tuned implementation of FMM, as well as efficient usage of hundreds of GPUs. Specifically, the serial execution achieves 15% faster performance than ExaFMM, whereas the distributed-memory strong-scaling evaluation using up to 1500 CPU cores demonstrates 64% to 81% of ExaFMM. The multi-GPU version achieves a 2.5x speedup over the CPU version when executed on 100 nodes of TSUBAME2.5 with 300 GPU.
Two-page extended abstract: pdf