Slurm - Multifactor Priorities
scontrol, sprioi command, Reservation & Fair-Share
Multifactor priority plugin PriorityType=priority/multifactor
…
- …ordering the queue of jobs based on several factors
- …highest priority jobs no necessarily evaluated first
- Job evaluation in following order (top-down)…
- …preemption (preemptor higher priority then preemptee)
- …reservation (jobs with advanced reservation have higher priority)
- …partition priority
- …job priority
- …submit time
- …job ID
sprio -w
lists configured weights for each factor- …weight can be assigned to each factor
- …enact a policy that blends a combination of factors
Reservation
Advanced Resource Reservation Guide, SchedMD:
- …reserve resources for jobs being executed by select users and/or accounts
- …identifies the resources in that reservation and a time period
- …resources reserved include cores, nodes, licenses and/or burst buffers
- …reservation contains nodes or cores associated with one partition
- …with the exception of a reservation created with explicitly requested nodes
List available reservations…
- …
ReservationName=
…identifier used to allocate resources from the reservation - …
Users=
andAccounts=
…access privileges associated to a reservation
# ...list all reservation in the system
sinfo -T
scontrol show reservations
Duration & Flags
Following is a subset of specifications (refer to the corresponding in the scontrol
manual page):
starttime=
- …
YYYY-MM-DD[THH:MM]
- …or
now[+time]
where time is count with a time unit (minutes, hours, days, or weeks)
- …
endtime=
…YYYY-MM-DD[THH:MM]
..alternatively use durationduration
…[[days-]hours:]minutes
orUNLIMITED
/infinite
flags=<list>
maint
identify system maintenance for the accountingignore_jobs
running during reserved timedaily
orweekly
reoccurring reservation
Reserve an entire cluster at a particular time for a system down time:
scontrol create reservation starttime=$starttime \
duration=120 user=root flags=maint,ignore_jobs nodes=ALL
Reserve a specific node to investigate a problem:
scontrol create reservation starttime=now \
$node user=root duration=infinite flags=maint nodes=
Remove a reservation from the system:
scontrol delete reservation=$name
Resources
By default, reservations must not overlap. They must either include different nodes or operate at different times. If specific nodes are not specified when a reservation is created, Slurm will automatically select nodes to avoid overlap and ensure that the selected nodes are available when the reservation begins. … Note a reservation having a
maint
oroverlap
flag will not have resources removed from it by a subsequent reservation also having amaint
oroverlap
flag, so nesting of reservations only works to a depth of two.
Options…
nodecnt=<num>
- …number of nodes to reserved (selected by the scheduler)
nodecnt=1k
suffix multiplies by 1024.
nodes=
…nodeset to use …nodes=all
reserve all nodes in the cluster.feature=
…only nodes with a specific feature
# specific set of nodes
scontrol ... nodes='lxbk[0700-0720],lxbk[1000-1002]' ...
# all nodes in a partition
scontrol ... partitionname=long nodes=all
Accounts & Users
Reservations can not only be created for the use of specific accounts/users…
- …if users and accounts are specified
- …job must match both in order to use the reservation
Options…
accounts=fire,ice
comma separated list allowed groups …accounts-=water,earth
allow all accounts except list accountsusers=alice,bob
comma separated list of allowed users …users=-zack
deny access for listed users
Add/remove individual accounts/users from an existing reservation
- …adding a ‘+’ or ‘-’ sign before the ‘=’ sign.
- …if accounts are denied access to a reservation
- ..account name preceded by a ‘-’
- …then all other accounts are implicitly allowed
- ..not possible to also explicitly specify allowed accounts.
# ...add an account to an existing reservation
scontorl update reservation=$name account+=fire
Usage
Reference a resource reservation with salloc
, srun
, and sbatch
…
- …option
--reservation=<name>
…allocate resources in specified reservation - …if a resource reservation provides nodes from multiple partitions…
- …required to use the
--partition=
option as well - …otherwise the schedule can not determine which resources to use
- …required to use the
# ...request a specific reservation for allocation
sbatch --reservation=$name ...
Alternatively use following input environment variables:
Environment Variable | Description |
---|---|
SLURM_RESERVATION |
Use a reservation with srun . |
SALLOC_RESERVATION |
Use a reservation with salloc . |
SBATCH_RESERVATION |
Use a reservation with sbatch . |
Priority Factors
List of configurable priority factors…
Factor | Description |
---|---|
age | time job is waiting in queue |
association | …factor defined for a job |
fair-share | …relation to resources consumed in the past |
size | …size of the resources a job allocates |
nice | …factor controlled by users |
partition | …priority associated to a partition |
qos | …quality of service associated |
site | …factor dictated by admins |
tres | …factor associated to the resources requested |
- …sum of all the factors that have been enabled
- …integer that ranges between 0 and 4294967295
- …the larger the number …the higher the job will be positioned in the queue
# ...list jobs in priority order with requested resources
squeue --priority --format="%.10A %.8Q %.3D %.3H %.3I %.3J %.10l %.10m %n" --sort=-p,i --state=PD
# ...modify the priority of a job
scontrol update job=$job_id priority=$priority
Nice
Users can adjust the priority of their own jobs…
- …positive values negatively impact a job’s priority
- …negative values increase a job’s priority
- …ranges from +/-2147483645
- …backfill algorithm may run lower-priority job before a higher priority job still
# ...put specified job first in queue for user
scontrol top $job_list
# ...specify a low-priority job
sbatch --nice=10000 #...