Slurm — Dynamic Nodes

HPC
Slurm
Published

January 5, 2026

Modified

January 22, 2026

Overview

Why Use the Slurm Dynamic Nodes Feature?

  • Leverage Temporary Node Resources — Enables on-demand utilization of transient resources
  • Shared Resource Pool — Facilitates multiple dynamic clusters sharing a common set of resources
  • Faster Provisioning — Nodes can be registered in real time without disrupting running workloads
  • Automation Friendly — Integrates cleanly with orchestration tools
Table 1: Summary of some key-aspects to consider
Aspect Static Nodes Dynamic Nodes
Multi cluster No Yes
Configuration slurm.conf scontrol create & slurmd -Z
Conf. persistent Yes No
Operations Predictable More complex (non persistent & invalid node states)
Node state Obvious DOWN/DRAIN External states (node transition between clusters)
Debugging Easy Harder
Model Hardware-centric Infrastucture-centric

What Does Infrastructure-Centric Mean in the Context of Dynamic Nodes?

  • The dynamic nodes feature is designed to work alongside infrastructure automation systems
  • For example, slurm-operator1 enables dynamic Slurm cluster management on Kubernetes
  • Major cloud providers (e.g., GCE, AWS) offer native integrations for dynamically scaling Slurm clusters
  • Flurm2 (flexible Slurm) is a collection of scripts that leverages configless and dynamic nodes to fluidly reallocate resources across multiple clusters.

Static vs Dynamic

Static (Non-Dynamic) Nodes — Discovery via slurm.conf

  • Predefined Node Configuration: Nodes must be explicitly listed in the slurm.conf file
  • Synchronized Configuration: The slurm.conf file must be consistent across all cluster nodes
  • Modifying Static Node Configuration requires restarting all services:
    1. Stop the slurmctld service
    2. Modify the Nodes= entry in the slurm.conf
    3. Restart the slurmd daemon on each node
    4. Restart the slurmctld service

Dynamic Nodes3˒4 — Node registration via CLI or REST API

  • Register Nodes on Demand: Use scontrol create and/or the REST API
  • No modification of configuration files
  • No Service Restarts: No restart of slurmctld or slurmd required
Important

Slurm periodically probes all nodes defined in its configuration to establish a connection with the corresponding slurmd instances. If nodes are migrated between multiple cluster instances, they must either be removed from the Slurm configuration or assigned unique DNS names to prevent collisions between slurmctld daemons.

Failure to do so can result in nodes being claimed by the wrong controller, leading to scheduling errors, failed job launches, or unpredictable cluster behavior. In environments where node reuse or migration is common, it is therefore recommended to enforce strict separation between cluster configurations, for example by using distinct DNS zones, node name prefixes, or automated cleanup of Slurm state before reassignment.

Registration

Prerequisites for dynamic node registration:5

  • Node Registration Method: slurmd registers nodes using NodeAddr/NodeHostname
  • Scheduler Configuration: Dynamic nodes are supported only when SelectType=select/cons_tres is configured in slurm.conf
  • Maximum Node Limit: The MaxNodeCount= parameter must be set in slurm.conf
  • Current Limitations:
    • Suboptimal Node Representation: Internal data structures for dynamic nodes are less efficient
    • Topology Plugin: Topology plugin configuration must be considered during node registration

Register a node via slurmd with option -Z

# Start slurmd with an option to dynamically register
slurmd -Z #…

# persistent configuration
echo 'SLURMD_OPTIONS=-Z' >> /etc/default/slurmd
systemctl restart slurmd.service
  • Option --conf — Defines additional parameters of a dynamic node (NodeName= not allowed)
  • Hardware resources optional. If configured they overwrites slurmd -C

Pre-register a node using the admin CLI

# add a node to the cluster
scontrol create state=FUTURE nodename=$nodeset #…

# Only dynamic nodes that have no running jobs and that are not
# part of a reservation can be deleted
scontrol delete nodename=$nodeset

Node States

Check node state using scontrol show node:

>>> scontrol show node ccexe0100
NodeName=ccexe0100 Arch=x86_64 CoresPerSocket=64 
   #…
   State=IDLE+DYNAMIC_NORM ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   #…

Dynamic node states describe where a node is in its lifecycle when using the dynamic nodes feature:

State Description
DYNAMIC_FUTURE Placeholder representing capacity that may appear in future
DYNAMIC_NORM Dynamic node registered successfully via slurmd, treats it real, active node

Resources & Features

Slurm checks resources6 during node registration — CPUs, RealMemory and TmpDisk

  • Missing resources drain a node automatically with state INVAL_REG
  • Resources & features can be configured with following methods:
    1. Static in slurm.conf
    2. Via the scontrol admin CLI
    3. As option to slurmd --conf= (probably via config. management, or ad-hoc scripts)
    4. Using the REST API for programmatic integration with infrastructure automation
  • Method 2, 3 & 4 are non persistent (dynamic) …lost in case the service nodes reboot
slurmd -Z --conf "RealMemory=714000 Feature=amd,9654"

Partition Assignment

By default nodes are not added to any partition

# …unless using `Nodes=All` in the partition definition
PartitionName=open Nodes=ALL #…

Static Configuration

ToDo

Dynamic Features

Use Nodeset= and registering dynamic nodes with a feature to add it to the nodeset

Nodeset=ns1 Feature=f1
Nodeset=ns2 Feature=f2

PartitionName=p1 Nodes=ns1
PartitionName=p2 Nodes=ns2
PartitionName=p3 Nodes=ns1,ns2

Run scontrol reconfigure after modifications to nodesets and partitions!

Node features can be changed by scontrol update

  • AvailableFeatures=<features> — Feature(s) available on the specified node
    • Features being removed via scontrol must not be active
    • Previously defined available feature specification will be overwritten with the new value
  • ActiveFeatures=<features> — Feature(s) currently active on the specified node
    • Previously active feature specification will be overwritten with the new value
    • ActiveFeatures may be configured as a subset of the AvailableFeatures
# remove active features by omitting them from the list
scontrol update nodename=$nodeset activefeatures="amd,epyc,9654"

# add a new available feature
scontrol update nodename=$nodeset availablefeatures="amd,epyc,9654,debug"

Footnotes

  1. Kubernetes Operator for Slurm Clusters, GitHub
    https://github.com/SlinkyProject/slurm-operator↩︎

  2. A novel approach to dynamic computing using Slurm, PSI
    https://github.com/paulscherrerinstitute/flurm↩︎

  3. Dynamic Nodes, Slurm Administrator Documentation
    https://slurm.schedmd.com/dynamic_nodes.html↩︎

  4. Cloudy, With a Chance of Dynamic Nodes, SLUG’22
    https://slurm.schedmd.com/SLUG22/Dynamic_Nodes.pdf↩︎

  5. Dynamic Nodes - Slurm Configuration, SchedMD Documentation
    https://slurm.schedmd.com/dynamic_nodes.html#config↩︎

  6. Node Configuration, slurm.conf Manual
    https://slurm.schedmd.com/slurm.conf.html#SECTION_NODE-CONFIGURATION↩︎