Memory Bandwidth and System Balance in HPC Systems
SessionInvited Talks - McGrayne and McCalpin
Session ChairIrene Qualters
Presenter
Event Type
Invited Talk
Introductory
Performance
LocationBallroom-EFGHIJ
DescriptionThe "Attack of the Killer Micros" began approximately 25 years ago as microprocessor-based systems began to compete with supercomputers (in some application areas). It became clear that peak arithmetic rate was not an adequate measure of system performance for many applications, so in 1991 McCalpin introduced the STREAM Benchmark to estimate "sustained memory bandwidth" as an alternative performance metric.
STREAM apparently embodied a good compromise between generality and ease of use and quickly became the "de facto" standard for measuring and reporting sustained memory bandwidth in High Performance Computing systems.
Since the initial "attack", Moore's Law and Dennard Scaling have led to astounding increases in the computational capabilities of microprocessors. The technology behind memory subsystems has not enjoyed comparable performance improvements, causing sustained bandwidth to fall behind.
This talk reviews the history of the changing "balances" between computation, memory latency, and memory bandwidth in deployed HPC systems, then discusses how the underlying technology changes led to these market shifts. Key metrics are the increasing relative "cost" of memory accesses and the massive increases in concurrency that are required to obtain increasing memory throughput.
A review of new technologies (such as "stacked DRAM") shows that the difficulties of delivering increased memory bandwidth are not alleviated unless the underlying computer architectures are changed in fundamental ways. The combination of technology trends and economic factors suggest that system balances will continue to shift in the same directions - favoring workloads with increasingly high compute intensity and increasing available concurrency.
STREAM apparently embodied a good compromise between generality and ease of use and quickly became the "de facto" standard for measuring and reporting sustained memory bandwidth in High Performance Computing systems.
Since the initial "attack", Moore's Law and Dennard Scaling have led to astounding increases in the computational capabilities of microprocessors. The technology behind memory subsystems has not enjoyed comparable performance improvements, causing sustained bandwidth to fall behind.
This talk reviews the history of the changing "balances" between computation, memory latency, and memory bandwidth in deployed HPC systems, then discusses how the underlying technology changes led to these market shifts. Key metrics are the increasing relative "cost" of memory accesses and the massive increases in concurrency that are required to obtain increasing memory throughput.
A review of new technologies (such as "stacked DRAM") shows that the difficulties of delivering increased memory bandwidth are not alleviated unless the underlying computer architectures are changed in fundamental ways. The combination of technology trends and economic factors suggest that system balances will continue to shift in the same directions - favoring workloads with increasingly high compute intensity and increasing available concurrency.









