PMIx - HPC Process Management Interface
HPC
Use-Cases…implements…
- …async event notification…
- …cross-model notification…
- …between OpenMPI & OpenMPI
- …coordinates resource utilization…programming blocks
- …allocation support
- …dynamically add & remove nodes
- …register pre-emption
- …dynamic process groups
- …async group construction & destruction
- …process failure notification
- ..generalizes tool support…
- …co-launch daemons with jobs
- …forward stdio channels
- …query jobs…system info…network traffic…process counts
- …storage integration…pre-cache files
Integration…where is it used…
- …libraries…most MPI libraries including OpenMPI
- …cluster resource managers…most including Slurm
- …integrates with…
- …debuggers (TotalView, DDT)
- …Spark, TensorFlow
- …logging information
- …containers…most including
apptainer
,docker
Compatibility with OpenMPI…
- …native support since Open MPI version 3.1
- …recommended to build OpenMPI using an external PMIx installation
- …both OpenMPI and PMIx need a built against the same libevent/hwloc
- …
--with-libevent=PATH
and/or--with-hwloc=PATH
Slurm support Matrix…
Version | Supports |
---|---|
16.05+ | PMIx v1.2+ (specifically not PMIx v2.x and above) |
17.11.0+ | PMIx v1.2+ and v2.x |
20.11+ | PMIx v1.2+, v2.x and v3.x |
22.05+ | PMIx v2.x, v3.x., v4.x. and v5.x |
PMI (Process Management Interface)…
- …provides a common abstraction to HPC resource managers
- …responsible for interaction between a Resource Manager (RM) and a parallel application
- …decouple process management from the underlying process manager
- …process Manager (PM) serves several purposes for parallel applications…
- …handle start/stop of processes
- …aggregation of I/O channels
std{in|out|err}
- …environment and signal propagation
- …central coordination point of parallel processes
Used by MPI libraries to interact with any compliant system (like Slurm):
- …requests the PM to start processes on the nodes of a parallel machine
- …propagate startup data with PMI out-of-band communication
- …processes use out-of-band communication to setup MPI communication
PMIx (Process Management Interface - Exascale)…
- …defines standard APIs (not the implementation)
- …open Source reference implementation of the standard
- …fully supports both of the existing PMI-1 and PMI-2 APIs
- …auto-negotiation messaging protocol from v2.1.x onwards
- …originally developed as an internal part of Open MPI…
- …attempts to hide most of these details from the end user
- …will translate configuration directives to PMIx and PRRTE
Three distinct entities…
- PMIx Standard…defines APIs, attribute strings…no details on implementation
- PMIx Reference Library
- …implementation of the standard (with all features)
- …for example OpenPMIx
PRRTE
“PRTE” is the operational name in tools and wrappers
PMIx Reference Runtime Environment…
- …full-featured PMIx environment
- …affiliated with OpenPMIx library
- …per-user development environment
- …shim for environments missing some PMIx functionality
- Derived from OpenRTE (ORTE)
- …forked from OpenMPI
- …standalone project in PMIx community
- …distributed with PMIx
- …replaces ORTE in OpenMPI 5+
- Persistent DVM distributed virtual machine
- …launch daemons on allocated nodes
- …user launch applications against DVM
- …tear down DVM when user session ends
Commands…
prte
…start DVMprun
…launch jobs- …
--omca
…OMPI parameters - …
--pmixmca
…PMIx parameters - …
--prtemca
…PRRTE parameters - …generic
--mca
picks best match
- …
prte_info
…build informationprted
…PRRTE daemon for remote nodespterm
…stop DVM
Install
Packages…
- OpenPMIx RPM SPEC files
- Fedora packages…
- RockyLinux
- Spack
References
- PMIx Standard
- OpenPMIx
- OpenMPI - The role of PMIx and PRRTE
…with Slurm
- MPI Documentation, SchedMD
- Known Bugs…
Presentations…
- PMIx: A Tutorial (2019)
- PMIx: Bridging the Container Boundary (2019)
- Evaluation and Benchmarking of Singularity MPI Containers (2019/20)
- On-node Resource Manager for Containerized HPC Workloads (2019/12)
- A Scalable PMIx Database (2018)
- PMIx Multi-Cluster Operations, SC17
- PMIx Plugin with UCX Support, SC17