Compiler-Directed Lightweight Checkpointing for Fine-Grained Guaranteed Soft Error Recovery
SessionResilience and Error Handling
Session ChairSriram Krishnamoorthy
Event Type
Paper
Intermediate
Introductory
Programming Systems
Reproducibility
Resiliency
Location355-BC
DescriptionThis paper presents Bolt, a compiler-directed soft error recovery scheme, that provides fine-grained and guaranteed recovery without excessive performance and hardware overhead. To get rid of expensive hardware support, the compiler protects the architectural inputs during their entire liveness period by safely checkpointing the last updated value in idempotent regions. To minimize the performance overhead, Bolt leverages a novel compiler analysis that eliminates those checkpoints whose value can be reconstructed by other checkpointed values without compromising the recovery guarantee. As a result, Bolt incurs only 4.7% performance overhead on average which is up to 57% reduction compared with the state-of-the-art schemes that require expensive hardware support for the same recovery guarantee as Bolt.











