SRC07. Job Startup at Exascale: Challenges and Solutions
Student: Sourav Chakraborty (Ohio State University)
Supervisor: Dhabaleswar K. Panda (Ohio State University)
Abstract: We identify the major performance bottlenecks and memory constraints in bootstrapping MPI and PGAS applications and propose scalable solutions for them. We introduce on-demand connection management to OpenSHMEM to reduce startup time. We also propose extensions to the PMI (Process Management Interface) standard to reduce the data transferred over the network as well as overlap the communication with other initialization tasks. Finally, we introduce a shared memory based design for PMI to reduce its memory footprint by a factor of processes per node (PPN). Our evaluation shows that with sufficient overlap, near-constant initialization time can be achieved at any process count for MPI and hybrid MPI+PGAS applications. Time taken for MPI_Init is reduced by 2.88 times at 16,384 processes. Initialization time of OpenSHMEM is improved by 29.6 times at 8,192 processes. Estimated memory footprint for PMI is reduced by nearly 1GB with 1 million processes and 16 PPN.
Two-page extended abstract: pdf