Slurm — RPM Binary Packages
SchedMD RPM Spec & Distribution Packages
Packages
If RPM packages is the method of choice for deployment 1…
- …multiple source for packages available
- …many clusters use RPM packages for deployment 2
- …RPMs build from the official RPM Spec files
RPM Spec
Find the official slurm.spec
3 of SchedMD on GitHub.…
- …defines a common denominator for all RPM based environments
- …allows to build Slurm in a required version for any target platform
- …includes Systemd service unit files and Slurm configuration examples
System integration is distribution specific and not included in the RPM spec. Following configuration are assumed to be done by other means:
- …adds a
slurm
group and user to the system - …create and set permissions for directories:
/etc/slurm/
,/var/{lib,log,spool}/slurm
- …create a Systemd tmp-file configuration for
/run/slurm
- …add log-rotation for
/etc/logrotate.d/slurm
Distributions
Fedora
SchedMD recommends to not use RPMs from EPEL! 4
Fedora Slurm packages build from a custom slurm.spec
5 to follow Fedora policies:
- …build limited to Slurm dependencies in versions supported in Fedora
- …dose not support Slurm dependencies like GPU libraries
- System integration in the Fedora RPM packages 6…
- …
tempfile.d
config to create/run/slurm
- …
/var/log/slurm
and.../logrotate.d/slurm
- …
/var/run/slurm
,/var/spool/slurm
….PID files… - …
slurm{dbd,cltd,d,restd}.service
systemd units
- …
Security patches and major version upgrades:
- …security patches are supplied to recent Fedora releases
- …the state of security patches for EPEL packages is unclear
- …EPEL does not get major version updates after release
- …new major version are typically only build for Fedora rawhide
OpenSUSE
OpenSUSE Slurm package is build from a custom slurm.spec
7:
- …OpenSUSE Tumbleweed use a normal naming schema like
slurm-23.02.7
- …OpenSUSE Leap uses package names like
slurm_23_02-23.02.7
- …codifies version in the name
- …likely to distinguish multiple versions on a platform
Debian
For reference… Debian/Ubuntu packages (since 23.11) 8
- …see slurm-23.11/debian …packages will be under a common
slurm-smd-*
prefix - …avoids conflicts with the existing mix of
slurm-wlm
/slurm-llnl
packages - …package layout aligned with the RPM layout from
slurm.spec
…not the existing unofficial Slurm debian packages
Prerequisites
Install the build environment locally:
# ...install DNF plugins package and the EPEL repository
dnf install -y dnf-plugins-core epel-release @development rpm-build
# ...enable the PowerTools Repository
dnf config-manager --set-enabled powertools
Create an Apptainer container with the build environment…
# Store build artifacts to a temporary directory...
cd $(mktemp -d /var/tmp/$USER-apptainer-XXXXXX)
export APPTAINER_TMPDIR=$PWD && export APPTAINER_CACHEDIR=$PWD
# Create an Apptainer definition file...
cat > apptainer-el8.def <<EOF
Bootstrap: docker
From: quay.io/rockylinux/rockylinux:8
%environment
export LC_ALL=C
%post
dnf install -y dnf-plugins-core epel-release @development rpm-build
dnf config-manager --set-enabled powertools
EOF
# ...and build the base container
apptainer build el8.sif apptainer-el8.def
Dependencies
Read the list of dependencies 9 collected by SchedMD.
List of RPM packages for Slurm dependencies…
hwloc-devel
…task/cgroup
pluginhdf5-devel
…HDF5 Job Profilingman2html
…HTML Man Pagesfreeipmi-devel
…acct_gather_energy/ipmi
accounting pluginrdma-core-devel
…acct_gather_interconnect/ofed
libjwt-devel
…for JWT authenticationlua-devel
…Lua API supportmunge-devel
…auth/munge
pluginmariadb-devel
…MySQL support for accountingpam-devel
…PAM supportnumactl-devel
…task/affinity
pluginreadline-devel
…Readline support inscontrol
andsacctmgr
rrdtool-devel
…ext_sensors/rrd
pluginhttp-parser-devel
&json-c-devel
…slurmrestd
REST API
# ...install Slurm dependencies
dnf install -y \
\
bzip2-devel \
freeipmi-devel \
glib2-devel gtk2-devel \
hdf5-devel http-parser-devel hwloc hwloc-devel json-c-devel \
libcurl-devel libibmad libibumad libssh2-devel libjwt-devel
lua lua-devel lz4-devel ncurses-devel numactl numactl-devel \
\
man2html mariadb-server mariadb-devel munge munge-libs munge-devel \
openmpi openssl openssl-devel \
pam-devel pmix-devel perl-Switch perl-ExtUtils-MakeMaker python3 \
readline-devel rdma-core-devel rrdtool-devel \
ucx ucx-devel ucx-ib zlib-devel
OpenMPI & UCX
If possible use the latest versions of both OemPMIx and UCX…
- PMIx compatibility should not be an issue anymore 10
- OpenMPI (and UCX) is available from Nvidia in the MLNX OFED distribution 11
Install following packages for support…
ucx
,ucx-devel
anducx-ib
for the UCX communication layerpmix-devel
for PMIx support in MPI launches …enable with build configuration option--with-pmi
Hardware Support
Support for specific hardware features (GPUs, network topology, etc) and there corresponding interface libraries…
NVIDIA GPUs require libnvidia-ml
development library
# ...CUDA on EL 8
dnf config-manager --add-repo \
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repodnf -y install cuda-nvml-devel-12-2 cuda-12-2
AMD Instinct GPUs 12 requires the ROCm development library…
- …build configuration option to enable is
--with-rsmi
- …configuration script searches for
rocm_smi.h
- …since Fedore EPEL 9 13 a
rocm-smi-devel
package is available
Package Build
Mock
Build default RPMs packages for a target platform from the SchedMD source code archive [^4ES6u]…
cd $(mktemp -d /tmp/slurm-XXXXX)
wget https://download.schedmd.com/slurm/slurm-23.11.1.tar.bz2
tar -xf slurm-23.11.1.tar.bz2
# ...create the SRPM package first
mock --root rocky+epel-8-x86_64 --install perl
mock --root rocky+epel-8-x86_64 --no-clean \
--sources slurm-23.11.1.tar.bz2 \
--spec slurm-23.11.1/slurm.spec \
--buildsrpm --resultdir=$PWD
# ...build the RPM packages from the source package
mock --root rocky+epel-8-x86_64 \
--rebuild slurm-23.11.1-1.el8.src.rpm \
--resultdir=$PWD
Configure build options …in case of issue check the build.log
# ...install dependencies
mock --root rocky+epel-8-x86_64 --install \
\
bzip2-devel \
freeipmi-devel \
glib2-devel gtk2-devel \
hdf5-devel http-parser-devel hwloc hwloc-devel json-c-devel \
libcurl-devel libibmad libibumad libssh2-devel libjwt-devel \
lua lua-devel lz4-devel \
ncurses-devel numactl numactl-devel \
man2html mariadb-server mariadb-devel munge munge-libs munge-devel \
openmpi openssl openssl-devel \
pam-devel pmix-devel perl-Switch perl-ExtUtils-MakeMaker python3 \
readline-devel rdma-core-devel rrdtool-devel \
ucx ucx-devel ucx-ib
zlib-devel
# ...build the Slurm packages
mock --root rocky+epel-8-x86_64 --no-clean \
--with hdf5 --with hwloc --with lua \
--with mysql --with numa --with pmix \
--with slurmrestd --with ucx \
--without debug --without x11 \
--rebuild slurm-23.11.1-1.el8.src.rpm \
--resultdir=$PWD
Apptainer
Create an Apptainer container with all Slurm build dependencies using following definition file apptainer.def
:
Bootstrap: localimage
From: el8.sif
%post
dnf install -y \
\
bzip2-devel \
freeipmi-devel \
glib2-devel gtk2-devel \
hdf5-devel http-parser-devel hwloc hwloc-devel json-c-devel \
libcurl-devel libibmad libibumad libssh2-devel libjwt-devel \
lua lua-devel lz4-devel \
ncurses-devel numactl numactl-devel \
man2html mariadb-server mariadb-devel munge munge-libs munge-devel \
openmpi openssl openssl-devel \
pam-devel pmix-devel perl-Switch perl-ExtUtils-MakeMaker python3 \
readline-devel rdma-core-devel rrdtool-devel \
ucx ucx-devel ucx-ib zlib-devel
Build the container from the definition file above…
apptainer build build-container.sif apptainer.def
…and build Slurm packages within this container:
# Download the required version of Slurm...
export VERSION=23.11.1
wget https://download.schedmd.com/slurm/slurm-$VERSION.tar.bz2
# ...and build the package in the Apptainer container
apptainer exec build-container.sif \
-ta slurm-$VERSION.tar.bz2 \
rpmbuild --with hdf5 \
--with hwloc \
--with lua \
--with mysql \
--with numa \
--with pmix \
--with slurmrestd \
--with ucx \
--without debug \
--without x11 \
| tee build.log
# ...packages should be available in
ls -1 ~/rpmbuild/{RPMS,SRPMS}/**/slurm*.rpm
Extend the apptainer.def
file above to include RPM files from your home-directory ~/rpmbuild/
path, which may store previously build dependencies packages:
#...
%files
${HOME}/rpmbuild/RPMS/x86_64/* /localrepo/
%post
#...
dnf install -y createrepo_c
cd /localrepo && createrepo .
echo -e "[local-repo]\nname=local-repo\nbaseurl=/localrepo\nenabled=1\nmetadata_expire=1d\ngpgcheck=0" > /etc/yum.repos.d/local_repo.repo
Customize
Export environment variables to customize the builds process…
VERSION
version of Slurm to buildDOMAIN
unique identifier for your environment …to distinguish packages from Fedora EPEL [^4QcLO]
export VERSION=23.02.0
# ...derive domain name from the host
export DOMAIN=$(hostname -d | cut -d. -f1)
Modify the RPM slurm.spec
configuration in the source archive…
# ...dowload the source archive from SchedMD
wget https://download.schedmd.com/slurm/slurm-$VERSION.tar.bz2
# ...extract the source archive
tar -xf slurm-$VERSION.tar.bz2
mv slurm-$VERSION $DOMAIN-slurm-$VERSION
# ...modify the Slurm RPM Spec configuration according to your needs
$EDITOR $DOMAIN-slurm-$VERSION/slurm.spec
Prefix the package-name to distinguish it from other Fedora EPEL packages:
Name: $DOMAIN-slurm
Conflicts: slurm-contribs,slurm-devel,slurm-doc,slurm-gui,slurm-libs,slurm-nss_slurm,slurm-openlava,slurm-pam_slurm,slurm-perlapi,slurm-pmi,slurm-pmi-devel,slurm-rrdtool,slurm-slurmctld,slurm-slurmd,slurm-slurmdbd,slurm-slurmrestd,slurm-torque
Create a new source archive…
tar -cjf $DOMAIN-slurm-$VERSION.tar.bz2 $DOMAIN-slurm-$VERSION
rm -rf $DOMAIN-slurm-$VERSION
Build new Slurm packages …build options described in slurm.spec
:
# ...run the RPM build command
rm -rf ~/rpmbuild \
&& rpmbuild -ta $DOMAIN-slurm-$VERSION.tar.bz2 \
--define "%domain $DOMAIN" \
--with hdf5 \
--with hwloc \
--with lua \
--with mysql \
--with numa \
--with pmix \
--with slurmrestd \
--with ucx \
--without debug \
--without x11 \
| tee rpmbuild.log
# ...copy RPM packages from the build directory
cp ~/rpmbuild/{RPMS,SRPMS}/**/*.rpm .
# ...list all files in the packages
rpm -qlp $DOMAIN-slurm*.rpm
Modify the installation prefix with a RPM macro file [^WMtZM]:
# ...unconventional file locations
cat >> ~/.rpmmacros <<EOF
%_prefix /opt/slurm/$VERSION
%_slurm_sysconfdir %{_prefix}/etc/slurm
%_defaultdocdir %{_prefix}/doc
EOF
Test
First test usually performed in a virtual environment…
- …one option is to us Vagrant to setup a test-environment 14
- …otherwise create an accessible RPM package repository and use it from a dedicated test infrastructure
Footnotes
Slurm Quick Start Administrator Guide, SchedMD
https://slurm.schedmd.com/quickstart_admin.html↩︎Slurm Installation and Upgrading, Nilfheim Cluster
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#build-slurm-rpms↩︎Slurm RPM Spec, SchedMD, GitHub
https://github.com/SchedMD/slurm/blob/master/slurm.spec↩︎Slurm Community BoF, SchedMD SC23, 2023/11
https://slurm.schedmd.com/SC23/Slurm-SC23-BOF.pdf↩︎slurm.spec
, Fedora Project
https://src.fedoraproject.org/rpms/slurm/blob/rawhide/f/slurm.spec↩︎slurm.spec
line 254, Fedora Project
https://src.fedoraproject.org/rpms/slurm/blob/rawhide/f/slurm.spec#_294↩︎slurm.spec
, OpenSUSE Build Service
https://build.opensuse.org/package/view_file/network:cluster/slurm/slurm.spec↩︎Slurm Community BoF, SchedMD SC23, 2023/11
https://slurm.schedmd.com/SC23/Slurm-SC23-BOF.pdf↩︎Slurm Dependencies, SchedMD
https://slurm.schedmd.com/download.html↩︎PMIx Slurm Compatibility Matrix, OpenPMIx Project
https://openpmix.github.io/support/how-to/slurm-support.html↩︎MLNX_OFED Linux drivers, NVIDIA
https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/↩︎AMD Instinct MI Series, AMD
https://www.amd.com/en/support/server-accelerators/amd-instinct/amd-instinct-mi-series/instinct-mi100↩︎AMD ROCm Packages, Fedora Project
https://packages.fedoraproject.org/search?query=rocm↩︎Vagrant Test Environment, GitHub
https://github.com/vpenso/vagrant-playground/tree/master/slurm/packages↩︎