Slurm — Service Daemons

Configuration & Operation of Slurm Services

HPC
Slurm
Published

November 3, 2015

Modified

October 23, 2025

Field Notes 8, SLUG’2024, SchedMD

Field Notes 8,1 SLUG’2024, SchedMD

Authentication Daemon

sackdProvides authentication for client commands

  • SACK — [S]lurm’s [a]uth and [c]red [k]iosk
  • Related to the auth/slurm and cred/slurm plugins
    • Slurm internal authentication and job credential plugins
    • …alternative to MUNGE authentication service
    • …separate from existing auth/jwt plugin
  • Requires shared /etc/slurm/slurm.key throughout the cluster

Control Daemon

slurmctld - The central management daemon of Slurm

# install from an RPM package
dnf install -y slurm-slurmctld

Packages include a systemd slurmctld.service unit2:

systemctl enable --now slurmctld
journalctl -u slurmctld

Systemd service units use Type=notify, option --systemd for slurm{ctld,d}

  • Catches configuration mistakes …continues execution (instead of failing)
  • Reconfigure allows for almost any (supported) configuration changes to take place
    • No explicit restart of the daemon required anymore (since 23.05)
    • SIGHUB similar behaviour to restarting slurm{ctld,d} processes

Configuration

Path Description
/etc/slurm/slurm.conf General Slurm configuration information

scontrol — Manages the Slurm configuration and state

# instruct all slurmctld and slurmd daemons to re-read the configuration file
scontrol reconfigure

Example of create a slurm users with axillary directories:

# create the `slurm` user & group
groupadd slurm --gid 900
useradd slurm --system --gid 900 --shell /bin/bash \
              --no-create-home --home-dir /var/lib/slurm \
              --comment "SLURM workload manager"

# create prerequisite directories
mkdir -p /etc/slurm \
         /var/lib/slurm \
         /var/log/slurm \
         /var/spool/slurm/ctld

# adjust directory permissions
chown slurm:slurm /var/lib/slurm \
                  /var/log/slurm \
                  /var/spool/slurm/ctld

SlurmUser

SlurmUser — User that the slurmctld daemon executes as

slurm.conf
SlurmUser=slurm
  • For security purposes, a user other than “root” is recommended
  • Typically a user slurm is running slurmctld
  • Must exist on all nodes for communications between Slurm components
# debug in fore-ground
su -l slurm -c 'slurmctld -Dvvvvv'

SlurmctldPidFile

SlurmctldPidFile — Path to store the daemon process id

slurm.conf
# Defaults to `/var/run/` (deprecated), use `/run` instead
SlurmctldPidFile=/run/slurmctld/slurmctld.pid

Make sure the required directory /run/slurmctld exists

cat > /etc/tmpfiles.d/slurmctld.conf <<EOF
d /run/slurmctld 0770 root slurm -
EOF
# apply the tmpfiles configuration
systemd-tmpfiles --create --prefix=/run/slurmctld
ls -dl /run/slurmctld/ # verify

Note: slurmctld.service uses following configuration:

>>> systemctl cat slurmctld | grep Runtime
RuntimeDirectory=slurmctld
RuntimeDirectoryMode=0755

Scalability

Maximum job throughput and overall slurmctld responsiveness (under heavy load) governed by latency reading/writing to the StateSaveLocation. In high-throughput (more then ~200.000 jobs/day) environments local storage performance for the controller needs to considered:

  • Fewer fast cores (high clock frequency) on the slurmctld host is preferred
  • Fast storage for the StateSaveLocation (preferably NVMe)
    • IOPS to this location can sustain a major bottleneck to job throughput
    • At least two directories and two files created per job
    • Corresponding unlink() calls will add to the load
    • Use of array jobs significantly improves performance…

Hardware, example minimum system requirements ~100.000 jobs/day with 500 nodes: 16GB RAM, dual core CPU, dedicated SSD/NVME (for state-save). The amount of RAM required increases with larger workloads and the number of compute nodes.

slurmdbd should be hosted on a dedicated node, preferably with a dedicated SSD/NVMe for the relational database (of a local MariaDB instance). The RAM requirements goes up in relation to the number of jobs which query the database. A minimum system requirement to support 500 nodes with ~100.000 jobs/day is 16-32 GB RAM just on the database host.

Static Nodes

Slurm services use slurm.conf to discover all slurmd instances for communication

  • Requires to know all nodes in advance to allow a static configuration
  • Requires slurm.conf to be in synchronized on all cluster nodes
  • If a static node configuration needs modification:
    1. Stop slurmctld
    2. Update the configuration for Nodes= (in slurm.conf)
    3. Restart slurmd daemons on all nodes
    4. (Re-)start slurmctld

Dynamic Nodes

Permits nodes to be added/deleted from system without adding them into slurm.conf3 4

  • …without restarting slurmctld & slurmd
  • Use cases…
    • Utilization of temporary node resources
    • Multiple dynamic clusters sharing a common pool of resources
    • Integration with cloud platforms with dynamic horizontal scaling

Controller uses NodeAddr/NodeHostname for dynamic slurmd registrations

  • …only supported with SelectType=select/cons_tres
  • …set MaxNodeCount= in the configuration
  • Limitations…
    • …suboptimal internal representation of nodes
    • …inaccurate information for topology plugins
    • …requires scontrol reconfigure or service restart

Configuration

By default nodes aren’t added to any partition…

  • Nodes=All in the partition definition…
  • ⇒ have all nodes in the partition, even new dynamic nodes
PartitionName=open Nodes=ALL MaxTime=INFINITE Default=Yes State=Up
  • Nodeset= …create nodesets, add the nodeset to a partition…
  • ⇒ registering dynamic nodes with a feature to add it to the nodeset
Nodeset=ns1 Feature=f1
Nodeset=ns2 Feature=f2
PartitionName=all Nodes=ALL
PartitionName=p1 Nodes=ns1
PartitionName=p2 Nodes=ns2
PartitionName=p3 Nodes=ns1,ns2

Run scontrol reconfigure after modifications to nodesets and partitions!

Node registration for example with slurmd –Z –conf="Feature=f1"

Operation

Dynamic registration requires option slurmd -Z

Two ways to add a dynamic node…

echo 'SLURMD_OPTIONS=-Z' >> /etc/default/slurmd
systemctl restart slurmd.service
  • Start slurmd with option -Z (dynamic node)…
    • …option --conf …defines additional parameters of a dynamic node
    • …hardware topology optional …overwrites slurmd -C if specified
    • NodeName= not allowed in --conf
  • scontrol create state=FUTURE nodename= [conf syntax]
    • …allows to overwrite slurmd -C hardware topology
    • …appending a features= for association with a defined nodeset=

Delete with scontrol delete nodename=

  • …needs to be idle …cleared from any reservation
  • Stop slurmd after the delete command

nss_slurm

…optional Slurm NSS plugin …password and group resolution

  • …serviced through the local slurmstepd process
  • …removes load from these systems during launch of huge numbers of jobs/steps
  • …return only results for processes within a given job step
  • …not meant as replacement for network directory services like LDAP, SSSD, or NSLCD

LDAP-less Control Plane

slurmctld without LDAP (Slurm 23.11)…

  • …enabled through auth/slurm credential format extensibility
  • …username, UID, GID captured alongside the job submission
  • auth/slurm permits the login node to securely provide these details
  • …set AuthInfo=use_client_ids in slurm{dbd}.conf

slurmdbd

SlurmDBD aka slurmdbd (slurm database daemon)…

  • …interface to the relational database storing accounting records
  • …configuration is available in slurmdbd.conf
    • …should be protected from unauthorized access …contains a database password
    • …file should be only on the computer where SlurmDBD executes
    • …only be readable by the user which executes SlurmDBD (e.g. slurm)

Documentation …Slurm Database Daemonslurmdbd.conf

  • …host running the database (MySQL/MariaDB) is referred to as back-end
  • …nodes hosting the database daemon is called front-end

Run the daemon foreground and verbose mode to debug the configuration:

# run in foreground with debugging enabled
slurmctld -Dvvvvv

# follow the daemon logs
multitail /var/log/slurm{ctld,dbd}

Back-End

Provides the RDBMS back-end to store accounting database …interfaced by slurmdbd

  • …dedicated database …typically called slurm_acct_db
  • …grant the corresponding permissions database server
cat > /tmp/slurm_user.sql <<EOF
grant all on slurm_acct_db.* TO 'slurm'@'node' identified by '12345678' with grant option;
grant all on slurm_acct_db.* TO 'slurm'@'node.fqdn' identified by '12345678' with grant option;
EOF
sudo mysql < /tmp/slurm_user.sql

On start slurmdbd will first try to connect with the back-end database…

  • StorageHost database hostname
  • StorageUser database user name
  • StoragePass database user password
  • StorageType database type
  • StorageLoc database name on the database server (defaults slurm_acct_db)
# back-end database
#
StorageHost=lxrmdb04
#StoragePort=3306
StorageUser=slurm
StoragePass=12345678
StorageType=accounting_storage/mysql
#StorageLoc=slurm_acct_db

Launch the interactive mysql shell…

/* ...list databases */
show database like 'slurm%' ;

/* ...check users access to databases */
select user,host from mysql.user where user='slurm';

Connect from a remote node…

  • …requires the MYSQL client (dnf install -y mysql)
  • …use the password set with StoragePass in slurmdbd.conf
mysql_password=$(grep StoragePass /etc/slurm/slurmdbd.conf | cut -d= -f2)
database=$(grep StorageHost /etc/slurm/slurmdbd.conf | cut -d= -f2)

# connect to the database server
mysql --host $database --user slurm --password="$mysql_password" slurm_acct_db

Front-End

Configure the Slurm controller to write accounting records to a back-end SQL database using slurmdbd as interface:

AccountingStorageType The accounting storage mechanism type. Acceptable values at present include “accounting_storage/none” and “accounting_storage/slurmdbd”. The “accounting_storage/slurmdbd” value indicates that accounting records will be written to the Slurm DBD, which manages an underlying MySQL database. See “man slurmdbd” for more information. The default value is “accounting_storage/none” and indicates that account records are not maintained. Also see DefaultStorageType.

In order to enable all nodes to query the accounting database with make sure that the following configuration is correct:

AccountingStorageHost The name of the machine hosting the accounting storage database. Only used with systems using SlurmDBD, ignored otherwise.

Note the configuration above referese to the node hosting the Slurm database daemon not the back-end database. An error similar to the following text is emitted by sacct if the connection the can not be established:

sacct: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819: Connection refused
sacct: error: Sending PersistInit msg: Connection refused
sacct: error: Problem talking to the database: Connection refused

Changes to this configuration require scontrol reconfigure to be propagated:

# check the configuration with...
>>> scontrol show config | grep AccountingStorage.*Host
AccountingStorageBackupHost = (null)
AccountingStorageHost   = lxbk0263
AccountingStorageExternalHost = (null)

Purge

The database can grow very large with time…

  • …depends on the job throughput
  • truncating the tables helps performance
  • …typically no need to access very old job metadata

Remove old data from the accounting database

# data retention
#
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
PurgeTXNAfter=12month
PurgeUsageAfter=24month 

Sites requiring access to historic account data…

  • …separated from the archive options described in the next section
  • …may host a dedicated isolated instance of slurmdbd
  • …runs a copy or part of a copy of the production database
  • …provides quick access to query historical information

Archive

Archive accounting database:

# data archive
#
ArchiveDir=/var/spool/slurm/archive
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveResvs=yes
ArchiveSteps=no
ArchiveSuspend=no
ArchiveTXN=no
ArchiveUsage=no

slurmrestd

Service slurmrestd that translate JSON/YAML over HTTP requests into Slurm RPC requests…

  • Allows to submit and manage jobs through REST calls (for example via curl)
  • Launch and manage batch jobs from a (web-)service

References…

slurmd

Each compute server (node) has a slurmd daemon…

  • …waits for work, executes that work…
  • …returns status, and waits for more work

Flags..

  • -…planned for backfill
  • *…not responding
  • $…maintenance
  • @…pending reboot
  • ^…rebooting
  • !…pending power down
  • %…powering down
  • ~…power off
  • #…power up & configuring

Configuration

# foreground debug mode...
slurmd -Dvvvvv

Configuration in slurm.conf

  • SlurmdUser…defaults to root
  • SlurmdPort…defaults to 6818
  • SlurmdParameters…see man-page
  • SlurmdTimeout…time in seconds (defaults to 300)…
    • …before slurmctld sets an unresponsive node to state DOWN
    • …ping by Slurm internal communication mechanisms
  • SlurmdPidFile…defaults to /var/run/slurmd.pid
  • SlurmdSpoolDir…defaults to /var/spool/slurmd
    • …daemon’s state information
    • …batch job script information
  • SlurmdLogFile…defaults to syslog
  • SlurmdDebug & SlurmdSyslogDebug
    • …during ops. quite,fatal,error or info
    • …debug…verbosedebug{2,3,4,5}

RebootProgram

The commands above will execute a RebootProgram

>>> scontrol show config | grep -i reboot                       
RebootProgram           = /etc/slurm/libexec/reboot

Example…

#!/bin/bash
IPMITOOL="$(which ipmitool)"
if [ $? -eq 0 ]; then
    # overcome hanging Lustre mounts...
    "$IPMITOOL" power reset
else
    /usr/bin/systemctl reboot --force
fi

Footnotes

  1. Field Notes 8, SLUG’2024
    https://slurm.schedmd.com/SLUG24/Field-Notes-8.pdf↩︎

  2. Systemd slurmctld.service Service Unit, GitHub
    https://github.com/SchedMD/slurm/blob/master/etc/slurmctld.service.in↩︎

  3. Dynamic Nodes, Slurm Administrator Documentation
    https://slurm.schedmd.com/dynamic_nodes.html↩︎

  4. Cloudy, With a Chance of Dynamic Nodes, SLUG’22
    https://slurm.schedmd.com/SLUG22/Dynamic_Nodes.pdf↩︎