HPC Network Interconnects

HPC
Network
Published

March 26, 2018

Modified

July 17, 2022

Network vs Farbic

  • network
    • designed as universal interconnect
    • vendor interoperability by design (for example )
    • all-to-all communication for any application
  • fabric
    • designed as optimized interconnect
    • single-vendor solution (Mellanox InfiniBand, Intel Omni-Path)
    • single system build for a specific application
    • spread network traffic across multiple physical links (multipath)
    • scalable fat-tree and mesh topologies
    • more sophisticated routing to allow redundancy and high-throughput
    • non-blocking (over-subscription) interconnect
    • low latency layer 2-type connectivity

Offload vs. Onload

  • network functions performed mostly in software “onload” (Ethernet, Omni-Path) [^2]
    • requires CPU resources ⇒ decreases cycles available to hosted applications
  • network functions performed by hardware “offload” (Infiniband, RoCE), aka Intelligent Interconnect
    • Network hardware performs communication operations (including data aggregation)
    • Increases resource availability of the CPU (improves overall efficiency)
    • Particularly advantageous in scatter,gather type collective problems
  • trade-off
    • more capable network infrastructure (offload) vs. incrementally more CPUs on servers (onload)
    • advantage of offloading increases with the size of the interconnected clusters (higher node count = more messaging)
  • comparison of Infiniband & Omni-Path [^1]
    • message rate test (excluding overheat of data polling) to understand impact of network protocol on CPU utilization
    • result: InfiniBand CPU resource utilization <1%, Omni-Path >40%

Comparison

  • Ethernet 10/25/40/50/100G (200G in 2018/19)
    • Widely in production, supported by many manufacturers (Cisco, Brocade, Juniper, etc.)
    • Easy to deploy, widespread expert knowledge
    • “High” latency (ms rather than ns)
  • InfiniBand 40/56/100G (200G 2017)
    • Widely used in HPC, cf. TOP500
    • De-facto lead by Mellanox
  • Omni-Path 100G (future roadmap?)
    • Intel proprietary
    • Still in its infancy (very few production installations)
    • Claims better bandwidth/latency/message rate then InfiniBand