SC16 Salt Lake City, UT

Charting the PMIx Roadmap

Authors: Dr. Ralph Castain (Intel Corporation)

BP Abstract: The PMI Exascale (PMIx) community will be concluding its second year of existence this fall that included the release of several implementations in both commercial and open source resource managers. We'll discuss what PMIx has accomplished over the past year and present a proposed roadmap for next year. The PMIx community includes viewpoints from across the HPC runtime community. To that end, we solicit feedback and suggestions on the roadmap in advance of the session, and will include time for a lively discussion at the meeting. So please join us at the BOF to plan the roadmap. New contributors welcome!

Long Description: The Process Management Interface (PMI) has been used for quite some time as a means of supporting HPC programming models, both for wireup of communication channels and exchange of general application-level information. However, the evolving effort to achieve exascale performance and beyond has placed new strains on that support, and has introduced an ever expanding set of requirements for interactions between applications and the host resource manager. PMI Exascale (PMIx) represents an attempt to address these issues by providing an extended version of the PMI definition specifically designed to support clusters up to and including exascale sizes. PMIx supports the current PMI-1 and PMI-2 APIs and also: (a) augments the APIs to eliminate current restrictions that impact scalability; (b) extends the capability of applications to interact with the RM; and (c) provides a standalone library (including both client and server support) to ease adoption of the desired capabilities while removing licensing issues that exist in some current implementations. New application-level features include the ability to request: * Reduced memory footprint: - distributed approach to database organization; - data scoping feature providing several levels of locality to describe a set of processes that may be interested in the particular information; - one instance of database per node with "zero-message" data access using shared-memory. * Reduced amount of communication: - data scoping helps to exclude local-only data from inter-node communication; - Flexible collectives scoping - "direct modex" for applications with sparse communication graphs. * changes in power policy and settings; * positioning of files for use by the application or another job step within the same allocated session; * notification of errors at the application and/or system level, including warning of predicted failures for preemptive response; * error response actions, including allocation of replacement resources and launch of replacement processes; * dynamic modification of allocations, including expansion and/or partial release of the existing allocation, and new allocations for subsequent spawn requests; * storage policies such as hot/warm/cold locations, burst buffer management, and persistence of files and/or shared memory regions across job steps within the same allocated session; and * fabric QoS and security constraints, plus information on network topology. We consider community interaction vital to the future of PMIx and the development of the project's roadmap. To use the BOF time effectively we are soliciting questions from the web before the BOF. Please send us your questions, comments, and feedback for discussion during the BOF: In this BOF, we will present the current state of the PMIx effort, describe its planned directions, and stimulate a discussion regarding desired features and other elements of the roadmap. Here are some of the highlights of what will be covered during the discussions: - An overview of PMIx - State-of-the-Union of RM and programming model support - PMIx 1.x release status - PMIx v2.0 release status and feature list Be part of the discussion: submit your questions ahead of time, come contribute to the roadmap, and see how you can (and should!) join our efforts.

Conference Presentation: pdf

Birds of a Feather Index