04. Exploring Randomized Multipath Routing on Multi-Dimensional Torus Networks
Authors: Prajakt Shastry (Illinois Institute of Technology)Daniel Parker (University of Chicago)Sanjiv Kapoor (Illinois Institute of Technology)Ioan Raicu (Illinois Institute of Technology)
Abstract: Network performance is a critical aspect of HPC, and improving performance is a major goal in the design of future systems. This work proposes to improve network performance through new routing algorithms, leveraging the rich multi-path topologies of multi-dimensional torus networks commonly found in supercomputers built in the past fifteen years. Virtually all torus networks in production today utilize the dimension order routing algorithm, which is essentially a static and deterministic routing strategy to allow internode communication. This static routing strategy has significant load balancing implications, leading to sub-par performance. We propose a new Random Distance Routing algorithm, which randomly distributes packets to different neighboring nodes that are closer to the destination, leading to global load balanced network. Through the CODES/ROSS  simulator, we show that the proposed randomized multi-path routing algorithm can increases throughput of a 5D-Torus network by 1.6X, as well as reduce latency by 40%.
Two-page extended abstract: pdf