Charting the PMIx Roadmap
Authors: Dr. Ralph Castain (Intel Corporation)
Abstract: The PMI Exascale (PMIx) community will be concluding its second year of existence this fall that included the release of several implementations in both commercial and open source resource managers. We'll discuss what PMIx has accomplished over the past year and present a proposed roadmap for next year.
The PMIx community includes viewpoints from across the HPC runtime community. To that end, we solicit feedback and suggestions on the roadmap in advance of the session, and will include time for a lively discussion at the meeting. So please join us at the BOF to plan the roadmap. New contributors welcome!
Long Description: The Process Management Interface (PMI) has been used for quite some time
as a means of supporting HPC programming models, both for wireup
of communication channels and exchange of general application-level
information. However, the evolving effort to achieve exascale performance
and beyond has placed new strains on that support, and has introduced an ever
expanding set of requirements for interactions between applications and
the host resource manager.
PMI Exascale (PMIx) represents an attempt to address these issues by providing
an extended version of the PMI definition specifically designed to support clusters
up to and including exascale sizes.
PMIx supports the current PMI-1 and PMI-2 APIs and also:
(a) augments the APIs to eliminate current restrictions that impact scalability;
(b) extends the capability of applications to interact with the RM; and
(c) provides a standalone library (including both client and server support) to
ease adoption of the desired capabilities while removing licensing issues that
exist in some current implementations.
New application-level features include the ability to request:
* Reduced memory footprint:
- distributed approach to database organization;
- data scoping feature providing several levels of locality to describe a set of
processes that may be interested in the particular information;
- one instance of database per node with "zero-message" data access using shared-memory.
* Reduced amount of communication:
- data scoping helps to exclude local-only data from inter-node communication;
- Flexible collectives scoping
- "direct modex" for applications with sparse communication graphs.
* changes in power policy and settings;
* positioning of files for use by the application or another job step
within the same allocated session;
* notification of errors at the application and/or system level, including
warning of predicted failures for preemptive response;
* error response actions, including allocation of replacement resources and
launch of replacement processes;
* dynamic modification of allocations, including expansion and/or partial
release of the existing allocation, and new allocations for subsequent spawn
* storage policies such as hot/warm/cold locations, burst buffer management,
and persistence of files and/or shared memory regions across job steps within
the same allocated session; and
* fabric QoS and security constraints, plus information on network topology.
We consider community interaction
vital to the future of PMIx and the development of the project's roadmap.
To use the BOF time effectively we are soliciting questions from the
web before the BOF. Please send us your questions, comments, and feedback for discussion during the BOF:
In this BOF, we will present the current state of the PMIx effort,
describe its planned directions, and stimulate a discussion regarding
desired features and other elements of the roadmap. Here are some of
the highlights of what will be covered during the discussions:
- An overview of PMIx
- State-of-the-Union of RM and programming model support
- PMIx 1.x release status
- PMIx v2.0 release status and feature list
Be part of the discussion: submit your questions ahead of time,
come contribute to the roadmap, and see how you can (and should!) join
Conference Presentation: pdf
Birds of a Feather Index