SC16 Salt Lake City, UT

Multi-Kernel OSes for Extreme-Scale HPC

Authors: Dr. Rolf Riesen (Intel Corporation)

BP Abstract: Lightweight multi-kernel operating systems -- where Linux and a lightweight kernel run side-by-side on a compute node -- have received attention recently for their potential to solve many of the challenges system software faces as we move toward ever more complex and parallel systems. The four leading multi-kernel projects are at a stage where input from users, runtime developers, and programming model architects, is desired to guide further design and implementation. This BOF is an ideal venue to engage the community, foster participation, and initiate long term collaboration.

Long Description: High-end HPC is faced with increasing numbers of CPU cores, heterogeneous architectures, a deepening of the memory hierarchy, and more complex memory technologies. The extreme degrees of parallelism and rapidly evolving hardware environments demand system software that is capable to execute large-scale parallel applications and at the same time nimbly adapt to new hardware and programming models. Additionally, the increasing prevalence of more complex application constructs, such as in-situ analysis, workflow composition, and the growing reliance on sophisticated tools dictate the need for the rich programming environments of POSIX/Linux. While lightweight kernels (LWK) have proven capable in providing scalable execution environments in extreme-scale systems, they lack most of the functionality Linux provides and users have come to expect. Recently, lightweight multi-kernels have emerged to reconcile the conflicting requirements HPC is facing at extreme scale. Multi-kernels run Linux and an LWK side-by-side on the cores of a compute node. The basic idea is that lightweight kernels can provide scalable execution, and due to their small code base, can also rapidly adapt to new hardware and software requirements. At the same time, through interplay with Linux, a multi-kernel system can support the full POSIX/Linux APIs. Although multiple research projects are now pursuing this direction, the questions of how node resources are shared between the two types of kernels, how exactly the two kernels interact with each other and to what extent they are integrated, remain subject to ongoing debate and research. While projects like ARGO and Hobbes address system-wide operating aspects to enable workflows and global resource management, this BoF concentrates on the most active multi-kernel projects that aim to provide compute node OSes for extreme-scale machines. With that in mind, we picked these four projects: mOS at Intel, IHK/McKernel at RIKEN, Kitten at Sandia National Laboratories, and FFMK at TU Dresden. After a quick introduction to multi-kernels, we will invite a representative for each project to provide a brief overview and current status. We then switch gears to an open discussion with the audience. The multi-kernel designers are seeking input from the community - users, application and runtime system developers - on what they need and expect from an extreme-scale OS, how they would like to see futuristic hardware resources exposed and managed, and what opportunities there are to co-design new OSes with higher level runtime systems that will allow applications to scale to and beyond exascale.

Birds of a Feather Index