Flux Resource Management Framework

Integration with HPC & Cloud Infrastucture

HPC
Published

June 28, 2023

Modified

February 8, 2024

What does Flux do?

Flux 1 Extends the traditional model of HPC resource management…

  • …developed for extreme-scale science and Exa-scale computing
  • convergence of HPC with machine learning (ML) and cloud-computing
  • …build to facilitate new hardware resource types…
    • …hybrid (or heterogeneous) combinations of processors
    • …GPUs and other accelerators
    • …multi-tiered disk storage
    • …including methods for power efficiency

Combines fully hierarchical resource management with graph-based scheduling…

  • …solves three primary deficiencies of existing workload manage
    • manages all types of resources …bare-metal, VMs, cloud and HPC
    • …scales from a local workstation (laptop) to big-scale HPC infrastructure
  • …uses recursively create nested instances…
    • on scheduler instance per user …improves robustness and execution performance
    • …user workflows can easily and automatically sub-divide their jobs into arbitrarily small tasks
  • …jobs/tasks can connect to one another through messaging overlays
    • data-stores built-in directly into Flux
    • …breaking down the job coordination barrier

Graph-Based Scheduling

Manage complex combinations of resources …heterogeneous, dynamic systems …local and cloud

  • …resource representation (a model for characterizing resources) on a directed graph
  • …capable of dynamically defining arbitrary resource types
  • …mathematical structure that associates…
    • …objects …vertices …(e.g., hardware, software, power distribution units)
    • …via directed relationships …edges …indicate containment (i.e., a server contains a CPU)
  • …resource request consists of descending into the graph and checking vertices for suitability
  • …allocate resources in different ways based on paths …permits priority based on proximity
  • …scheduling operations are basic procedures in the context of directed graphs

Hierarchical Management

Divide-and-conquer approach…

  • …resources divided among schedulers in the hierarchy …increases scalability
  • Three distinct principles…
    • …parent Flux instance grants resource allocations to its children
    • each instance configured independently …responsible for effective use of resources
    • …first two principles apply recursively from the top of the resource hierarchy
  • …instances to delegate work to child instances …spreading the load
  • …creation of the appropriate number of Flux instances for each workflow

Fluxion …scheduler component …scalable graph-based scheduling techniques

Converged Computing

…coexistence of (traditional) HPC and Cloud resources

  • Fluence 2 …the Flux scheduler swapped with kube-scheduler…
    • …HPC-oriented technology swapped into Kubernetes for cloud-native orchestrator
  • Flux Operator 3 …for Kubernetes
    • …create & control a HPC workload manager inside Kubernets
    • …“mini”-cluster scheduled in Kubernets with Flux fine-grained resource mapping
  • …reference to video talks 4 5

Single-User Mode

Flux allows for both single-user and multi user modes…

  • …take a look to “Introduction to Flux” 6
  • …single-user mode → overlay workload manager
  • …on top of the native system-level workload manager (like Slurm)

Provide users with the comprehensive ability to manage resources within their own allocation

  • …streamline applications coupling, coordination, and dependency management
  • …set up customized hierarchies
    • …policies based on the graph-based resource model
    • …scheduling options such as queue depths an throttling of jobs
    • …ensemble-based workflow …short-duration, single-core jobs spin up a network of nested Flux instances

First Steps

In a container…

# ...Flux instance from a container
>>> podman run -ti fluxrm/flux-sched:latest
ƒ(s=1,d=0) fluxuser@afac2f6d30de:~$ flux --help

# ...emulate a multi-node deployment
>>> podman run -ti fluxrm/flux-sched flux start --test-size 4
ƒ(s=4,d=0) fluxuser@e67d73ebe096:~$ flux resource list
     STATE NNODES   NCORES    NGPUS NODELIST
      free      4       16        0 e67d73ebe[096,096,096,096]
 allocated      0        0        0 
      down      0        0        0 

First job…

# ...submits a job which will be scheduled and run in the background
>>> flux submit hostname
ƒCUHCibq     # ...returns job ID

# ...run a job interactively
>>> flux run hostname
7f995365e9a2

# ...submit and list jobs
>>> flux submit sleep 360
ƒ2MU1GM7V
>>> flux submit sleep 360
ƒ2MtVYV5q
>>> flux jobs
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒ2MtVYV5q fluxuser sleep       R      1      1   3.935s 7f995365e9a2
   ƒ2MU1GM7V fluxuser sleep       R      1      1   4.893s 7f995365e9a2

# ....inspect a job
flux job info ƒ2MtVYV5q jobspec | jq

# ...summery of all jobs
flux top

Job Management

flux submit queues jobs… --cc (i.e. carbon copy) duplicates jobs …--wait for job completion

# ...submit different jobs for demonstration
flux submit --cc=0-1 --wait /bin/true
flux submit --cc=0-1 --wait /bin/false
flux submit --cc=0-7 sleep inf

flux jobs …states …R (running), CD (completed), F (failed), and CA (canceled)

# ...list running (& pending) jobs
>>> flux jobs
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒCV3ko224 fluxuser sleep       R      1      1   3.961m 7f995365e9a2
   ƒCV3ko223 fluxuser sleep       R      1      1   3.961m 7f995365e9a2
# [...]

# ...list all other jobs
>>> flux jobs --filter=inactive
       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
   ƒBqMEQEw9 fluxuser false       F      1      1   0.034s 7f995365e9a2
   ƒBqMEQEwA fluxuser false       F      1      1   0.029s 7f995365e9a2
   ƒBFFuA8AL fluxuser true       CD      1      1   0.036s 7f995365e9a2
# [...]

# ...many options on filter available
>>> flux jobs --format=long ƒCV3ko224 ƒCV3gM4B2 

With Slurm

No requirement on the cluster resource provider (underlying Slurm cluster) 7:

  • …changes notion of compute jobs …language to describe these
  • Flux keeps track of hardware …resources within a user-allocation
    • …the notion of jobs is then independent from the parent system
    • …user interaction completely isolated within Flux
  • …enables quick error recovery …optimization & increased job throughput