ClusterShell: Cluster Management Toolkit

Reference
SSH
Published

July 24, 2024

Modified

January 21, 2025

What is ClusterShell1?

Execute remote shell commands on many nodes in parallel

  • …parallel execution-framework
    • …copy files, execute shell commands & gather results
    • …default shell command is SSH
  • …scalable to hundreds of nodes
  • Python based…
    • …event-driven library interface
    • …asynchronous & non-blocking I/O

Focus on daily ad-hoc node administration and operations

  • …support provisioning & installation of compute-clusters
  • …enables admins to efficiently perform daily operations

Name Conventions

Cluster node naming schemas…

  • …systematic …common node prefix …plus numbering schema
  • …simple numbering for individual nodes
  • Sometimes …multi-dimensional numbering scheme …examples:
    • …physical location …server room …rack & slot
    • …logical position …according to network topology

Node Sets

Define a static list of nodes belonging to a cluster:

wlm01,dbm01
node[02-08]
node[10-24,34].ipmi
room[1,2]rack[01-14]

node[01-10]!node03         ⇒ node[01-02,04-10]
node[01-03]&node[02-10]    ⇒ node[02,03]

Syntax to specify cluster nodes, aka host names…

  • Host name specification…
    • , separate naming patterns
    • […] for example [01-15,20,22-28]
      • …continues numbering range …separate by ,
      • …start & end number + - dash
      • …padding of leading zeros
  • Set operations:
    • , union
    • ! difference
    • & intersection
    • ^ symmetric difference

Node Groups

Collection of nodes …generated by an external source:

  • …called group source configuration binding
  • Example use-case…
    • …read from an hardware inventory
    • …query a asset management database
    • …query resource management system (for example Slurm)

Node group evaluated into a node set when needed

@slurm:partition:main
@slurm:job:397584235
@slurm:partition:main!@slurm:state:down

@room:2&@rack:b2
@cpu:amd!@gpu:nvidia

Unified node group notation:

  • @ prefixed for group sources
  • : separate node groups

Combine static nodes sets & dynamic node groups

Commands

Install required packages

# RPM based Linus distributions
sudo dnf install -y clustershell sshpass parallel

List of the Clustershell commands

  • nodeset
    • …efficient support for node set & node groups bindings
    • …unifies node groups syntax
    • …use external source to generate node groups (like Slurm)
  • clush
    • …execute commands in parallel
    • …copy files in parallel
    • …display of command results
    • …gather output …output node sets
  • clubak …gather & merge identical command outputs

nodeset

nodeset2 used to manipulate node set & groups

>>> echo 'node01,node03,node10' | nodeset -f
node[01,03,10]

>>> nodeset -e 'node[01,03-05],box[12,17-20]'    
box12 box17 box18 box19 box20 node01 node03 node04 node05
  • -c count nodes in set
  • fold & expand
    • -f folds (compacts) to a nodeset
    • -e expands a nodeset into a list
  • expansion format
    • -S separator string …for example \n or ,
    • -O string format %s (Python builtin)
>>> nodeset -eS ',' 'node[01-05]'
node01,node02,node03,node04,node05

>>> nodeset -eS '\n' -O '%s.ipmi' 'wlm01,dbm01,node[01-03]'
dbm01.ipmi
node01.ipmi
node02.ipmi
node03.ipmi
wlm01.ipmi

Shell Pipes

Examples using nodeset with other commands

# send a ping to each node
nodeset -e 'node[01.03-05]' \
    | xargs -n1 -- ssh $bastion ssh-keyscan >> ~/.ssh/known_hosts

# alternative method to ping multiple nodes
fping $(nodeset -e 'node[01,03-05]')

# collect a nodeset from the Slurm workload management systems
sinfo -h -N -o '%n' -t allocated,mixed | nodeset -f

Shell Scripts

Use nodeset in a shell-function:

ssh-host-key-remove() {
        for node in $(nodeset -e "$1")
        do
                ssh-keygen -qR $node
                # node without domain name
                ssh-keygen -qR $(echo $node | cut -d. -f1)
                # for the node IP address
                ssh-keygen -qR $(host $node | cut -d' ' -f4)
        done
}

Environment Variables

CLUSTER_NODES — shell function used a common store for a node set

  • …manages the $CLUSTER_NODES environment variable
  • …facilities use of the same node set from multiple commands
CLUSTER_NODES() {
        # Export the CLUSTER_NODES variable to the environment
        # ...if input comes from STDIN...
        if [ ! -t 0 ] ; then
                read stdin
                # ...export the CLUSTER_NODES variable to the environment
                export CLUSTER_NODES=$stdin
        # ...if a single command-line arguments is present...
        elif [ $# -eq 1 ] ; then 
                export CLUSTER_NODES=$@
        # Otherwise if no command-line argument is present...
        elif [ $# -eq 0 ] ; then
                # ...make sure the CLUSTER_NODES variable is present
                if [ -n "$CLUSTER_NODES" ]
                then
                        echo "$CLUSTER_NODES"
                else
                        echo 1>&2 "CLUSTER_NODES environment variable is empty, unset or blank!"
                fi
        # Catch all if more then single command-line argument is present
        else
                echo 1>&2 "Error: No argument or a single nodeset argument required!"
        fi
}

Store a node set for re-use:

# pipe a node set via stdin
echo 'node01,node03,node10' | nodeset -f | CLUSTER_NODES

# passing a node set as argument
CLUSTER_NODES $(echo 'node01,node03,node10' | nodeset -f)

Inspect the content:

>>> echo $CLUSTER_NODES                                      
node[01,03,10]

>>> CLUSTER_NODES                                            
node[01,03,10]

Use with clush -w:

clush -w "$(CLUSTER_NODES)" #…

clush

ToDo

Configuration

Environment Variable Description
$CLUSTERSHELL_CFGDIR Set the path to the Clustershell configuration

Create a test environment:

pushd $(mktemp -d /tmp/$USER-clustershell-XXXXXX)

# set path to the Clustershell configuration files
export CLUSTERSHELL_CFGDIR=$PWD

# basic directory structure
mkdir groups.d groups.conf.d clush.conf.d
# required for some reason
touch groups.d/local.cfg
# start of configuration
cat > clush.conf <<EOF
[Main]
connect_timeout: 30
verbosity: 2
EOF

# clean up
popd && rm -rf /tmp/$USER-clustershell*

Run Mods

Run modes3 — Group of configuration settings associated to a name:

Option Description
-m, --mode Select a specific run mode

Add configuration files to $CLUSTERSHELL_CFGDIR/clush.conf.d/*.conf

SSH Passwords

Use sshpass to automate password prompts …if no SSH keys are deployed:

cat > clush.conf.d/sshpass.conf <<EOF
[mode:sshpass]
password_prompt: yes
ssh_path: /usr/bin/sshpass /usr/bin/ssh
scp_path: /usr/bin/sshpass /usr/bin/scp
ssh_options: -oBatchMode=no -oPreferredAuthentications=password -oPubkeyAuthentication=no

[mode:sshpass-file]
password_prompt: no
ssh_path: /usr/bin/sshpass -f password.txt /usr/bin/ssh
scp_path: /usr/bin/sshpass -f password.txt /usr/bin/scp
ssh_options: -oBatchMode=no -oPreferredAuthentications=password -oPubkeyAuthentication=no

[mode:sshpass-env]
password_prompt: no
ssh_path: /usr/bin/sshpass -e /usr/bin/ssh
scp_path: /usr/bin/sshpass -e /usr/bin/scp
ssh_options: -oBatchMode=no -oPreferredAuthentications=password -oPubkeyAuthentication=no
EOF

…select the required mode with option -m

# interactive prompt for a password
clush -m sshpass #...

# read password from a text file
echo secret123 > password.txt
clush -m sshpass-file #...

# read password from the SSHPASS environment variable
export SSHPASS=secret123
clush -m sshpass-env  #...

SSH Knownhosts

Disable strict host key checking on SSH login:

cat > clush.conf.d/ssh-knownhosts.conf <<EOF
[mode:lax]
ssh_options: -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -q
EOF

Node Groups

Node groups4 — Syntax to represent a collection of nodes

  • source of truth to admins …naming convention for nodes
  • …convenient way to define & manipulate groups of nodes aka nodeset5
  • …simplifies group provisioning (…through user-defined external shell commands)
  • …node groups might be very dynamic and their nodes might change very often

Node group expression is @source:group

# list configured group sources
nodeset --groupsources

# list available groups from the **default source**
nodeset -l

# list groups from a specific group source 
nodeset -l -s $source # (do not use `@` as prefix)

# list groups from all available group sources
nodeset -L

Node groups in basic commands:

# group expansion
nodeset -e @$source:$group

# find a node group (for a given node)
nodeset -l $node
nodeset -s $source -ll $node

# arithmetic operations …exclude a specific nodeset
nodeset -f @$source:$group -x @$source:$group
nodeset -f "@$source:$group!@$source:$group"

Group Sources

groups.conf6 — …defines group sources

  • …how to access the data source (simple shell command)
  • …multiple configuration options: file-based7 or external
groups.conf
[Main]
default: genders
confdir: $CFGDIR/groups.conf.d
autodir: $CFGDIR/groups.d

# external source
[genders,g]
map: nodeattr -f path/to/genders -n $GROUP
list: nodeattr -f path/to/genders -l

Sections in groups.conf:

  • [Main]
    • default …default node group source
    • confdir …path to look for group configuration …suffix .conf
    • autodir …path to group-dictionaries …suffix .yaml
  • Following sections…
    • …define additional group sources
    • …group source name (example genders) …should not include : colon
    • …can be a comma-separated list of source names

Slurm

groups.conf.d/slurm.conf
[partition]
map: ssh login.example.org -- sinfo -h -o "%N" -p $GROUP
all: ssh login.example.org -- sinfo -h -o "%N"
list: ssh login.example.org -- sinfo -h -o "%R"
reverse: ssh login.example.org -- sinfo -h -N -o "%R" -n $NODE

[state]
map: ssh login.example.org -- sinfo -h -o "%N" -t $GROUP
all: ssh login.example.org -- sinfo -h -o "%N"
list: ssh login.example.org -- sinfo -h -o "%T" | tr -d '*~#$@+'
reverse: ssh login.example.org -- sinfo -h -N -o "%T" -n $NODE | tr -d '*~#$@+'
cache_time: 60

[job]
map: ssh login.example.org -- squeue -h -j $GROUP -o "%N"
list: ssh login.example.org -- squeue -h -o "%i" -t R
reverse: ssh login.example.org -- squeue -h -w $NODE -o "%i"
cache_time: 60

[user]
map: ssh login.example.org -- squeue -h -u $GROUP -o "%N" -t R
list: ssh login.example.org -- squeue -h -o "%u" -t R
reverse: ssh login.example.org -- squeue -h -w $NODE -o "%i"
cache_time: 60

Example usage…

# check the load on all nodes in the debug partition
clush -w @partition:debug -- uptime

# login to all nodes a specific user has allocated
clush -l root -w @user:jdoe -- load

# check the /tmp directory on a node executing a specific job
clush -l root -w @job:20268713 -- 'df -Ph /tmp'

# reboot all nodes in state drained
clush -l root -w @state:drained -- systemctl reboot

Genders

Genders8static cluster node inventory

# on RPM based systems
sudo dnf install -y genders

Example configuration file:

$PWD/genders
wlm01       service:slurmctld
dbm01       service:mariadb
node[01-10] service:slurmd

wlm01,dbm01 class:service
node[01-02] class:submit
node[03-10] class:compute

node[01-05] cpu:amd,cpu-type:epyc_9354
node06      cpu:amd,gpu:amd,gpu-type:mi100
node[07-10] cpu:intel,gpu:nvidia,gpu-type:h100

node[01-05] rack:l1r4,pdu:l1r4b
node[06-10] rack:l2r1,pdu:l2r1a
# check syntax of the configuration file
nodeattr -f genders -k

# use configuration file in working directory
alias nodeattr="nodeattr -f genders"

nodeattr

  • -f — path to a custom configuration file
  • -k — check syntax of the configuration file
  • -A — list all nodes (comma separated)
  • -l — list node attributes
  • List nodes that match the specified query…
    • -q — nodeset format
    • -c — comma separated
    • -n — line-feed separated
    • -s — space separated
  • Set operations…
    • || — union
    • && — intersection
    • -- — difference
    • ~ — complement
# list all nodes
>>> nodeattr -A
dbm01,node[01-10],wlm01

# all attributes for a specific node
>>> nodeattr -l node01
rack:l1r4
cpu-type:epyc_9354
pdu:l1r4b
class:submit
cpu:amd
service:slurmd

# list nodes matching an attribute
>>> nodeattr -q cpu:amd
node[01-06]

# …same but space separated
>>> nodeattr -s cpu:amd
node01 node02 node03 node04 node05 node06

# list nodes matching two attributes
>>> nodeattr -q 'class:compute&&cpu:intel'    
node[07-10]