InfiniBand and High-Speed Ethernet: Advanced Features, Challenges in Designing HEC Systems, and Usage
Event Type
Tutorial
Accelerators
Advanced
Architectures
Intermediate
Networks
Location355-C
DescriptionAs InfiniBand (IB) and High-Speed Ethernet (HSE) technologies mature, they are being used to design and deploy various High-End Computing (HEC) systems: HPC clusters with accelerators (GPGPUs and MIC) supporting MPI, Storage and Parallel File Systems, Cloud Computing systems with SR-IOV Virtualization, Big Data systems with Hadoop (HDFS, MapReduce and HBase), Multi-tier Datacenters with Web 2.0 (memcached) and Grid Computing systems. These systems are bringing new challenges in terms of performance, scalability, portability, reliability and network congestion. Many scientists, engineers, researchers, managers and system administrators are becoming interested in learning about these challenges, approaches being used to solve these challenges, and the associated impact on performance and scalability. This tutorial will start with an overview of these systems. Advanced hardware and software features of IB, HSE and RoCE and their capabilities to address these challenges will be emphasized. Next, we will focus on Open Fabrics RDMA and Libfabrics programming, and network management infrastructure and tools to effectively use these systems. A common set of challenges being faced while designing these systems will be presented. Finally, case studies focusing on domain-specific challenges in designing these systems (including the associated software stacks), their solutions, and sample performance numbers will be presented.
Links









