SC16 Salt Lake City, UT

10. Optimizing Application I/O by Leveraging the Storage Hierarchy Using the Scalable Checkpoint Restart Library with a Monte Carlo Particle Transport Application on the Trinity Advanced Computing System

Authors: Michael M. Pozulp (Lawrence Livermore National Laboratory)Gregory B. Becker (Lawrence Livermore National Laboratory)Patrick S. Brantley (Lawrence Livermore National Laboratory)Shawn A. Dawson (Lawrence Livermore National Laboratory)Kathryn Mohror (Lawrence Livermore National Laboratory)Adam T. Moody (Lawrence Livermore National Laboratory)Matthew J. O'Brien (Lawrence Livermore National Laboratory)

Abstract: The poster accompanying this summary exhibits our experience using the Scalable Checkpoint Restart library (SCR) to achieve I/O speedups during checkpoint and restart. We ran Lawrence Livermore National Laboratory's (LLNL) Monte Carlo particle transport code, Mercury, on Trinity at Los Alamos National Laboratory (LANL). We performed a weak scaling study and observed speedups at 16 nodes and above, including a 30x maximum speedup at 4096 nodes. We benchmarked read performance by restarting from the checkpoints we wrote and observed speedups for 11 out of 12 counts, including a 9x maximum speedup at 2048 nodes. Finally, we ran a user problem in which using SCR reduced median time-to-checkpoint by 20x. Our results show that leveraging the storage hierarchy is necessary for optimizing application I/O.

Poster: pdf
Two-page extended abstract: pdf

Poster Index