Presentation

· Presenter IndexPresenters · Organization IndexOrganizations · Search Program · Flagged · Happening Now · QRCode Reader

Paper

: A Multi-Faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems

ask a question

give feedback

SessionClouds & Job Scheduling

Session ChairAli R. Butt

Authors

Event Type

Paper

Event Tags

Clouds and Distributed Computing

Intermediate

Introductory

Performance

System Software

TimeThursday, November 17th4:30pm - 5pm

Location355-BC

DescriptionJob placement plays a pivotal role in application performance on supercomputers. We present a multi-faceted exploration to influence placement in extreme-scale systems, to improve network performance and decrease variability. In our first exploration, Scores, we developed a machine learning model that extracts features from a job’s node-allocation and grades performance. This identified several important node-metrics that led to Dual-Ended scheduling, a means of reducing network contention without impacting utilization. In evaluations on the Titan supercomputer, we observed reductions in average hop-count by up to 50%. We also developed an improved node-layout strategy that targets a better balance between network latency and bandwidth, replacing the default ALPS layout on Titan that resulted in an average of 10% runtime improvement. Both of these efforts underscore the importance of a job placement strategy that is cognizant of workload mixture and network topology.

Download PDF

Paper provided by the IEEE Computer Society
Paper also available from the ACM Digital Library

Authors

Christopher Zimmer (presenting)

Oak Ridge National Laboratory

Saurabh Gupta

Oak Ridge National Laboratory

Scott Atchley

Oak Ridge National Laboratory

Sudharshan Vazhkudai

Oak Ridge National Laboratory

Carl Albing

US Naval Academy

Navigation