43. MuMMI_R: Analyzing and Modeling Power and Time Under Different Resilience Strategies
Authors: Xingfu Wu (Texas A&M University)Valerie Taylor (Texas A&M University)Zhiling Lan (Illinois Institute of Technology)
Abstract: While reducing execution time is still a major objective for high performance computing, future systems and applications will have additional power and resilience requirements that represent a multidimensional tuning challenge. In this poster we present MuMMI_R: analyzing and modeling power and time under different resilience strategies. We use FTI (Fault Tolerance Interface) library to conduct our experiments to evaluate how using FTI with checkpoints of different levels at different frequencies impacts the power consumptions at different node components (Node, CPU, Memory, Disk and Network) and energy consumptions of the MPI memory benchmark STREAM on three different architectures IBM BG/Q, Intel Haswell and AMD Kaveri. Our experimental results provide a better understanding the tradeoffs among runtime, power and resilience.
Two-page extended abstract: pdf