InfiniBand: Mellanox Hardware & Firmware

Hardware
HPC
Network
InfiniBand
Published

August 19, 2015

Modified

January 2, 2025

Nvidia InfiniBand Networking Solutions

Switches

Switch Config. Ports Speed
SB7800 fixed 36 EDR
QM87xx fixed 40 HDR
QS8500 modular 800+ HDR
QM97xx fixed 64 NDR

Switches come in to configurations…

  • fixed …number of port
  • modular …gradually expandable port modules

Switches come in two flavors…

  • managed
    • …MLXN OS features unlocked
    • …access over SSH, SNMP, HTTPs
    • …enables monitoring and configuration
  • unmanaged
    • …in-band management is possible
    • …status via chassis LEDs

Get information from unmanaged switches with ibswinfo.sh

# requires MST service
>>> ./iwswinfo.sh -d lid-647
=================================================
Quantum Mellanox Technologies
=================================================
part number        | MQM8790-HS2F
serial number      | MT2202X19243
product name       | Jaguar Unmng IB 200
revision           | AK
ports              | 80
PSID               | MT_0000000063
GUID               | 0x1070fd030003af98
firmware version   | 27.2008.3328
-------------------------------------------------
uptime (d-h:m:s)   | 26d-20:16:01
-------------------------------------------------
PSU0 status        | OK
     P/N           | MTEF-PSF-AC-C
     S/N           | MT2202X18887
     DC power      | OK
     fan status    | OK
     power (W)     | 165
PSU1 status        | OK
     P/N           | MTEF-PSF-AC-C
     S/N           | MT2202X18881
     DC power      | OK
     fan status    | OK
     power (W)     | 148
-------------------------------------------------
temperature (C)    | 63
max temp (C)       | 63
-------------------------------------------------
fan status         | OK
fan#1 (rpm)        | 5959
fan#2 (rpm)        | 5251
fan#3 (rpm)        | 6013
fan#4 (rpm)        | 5251
fan#5 (rpm)        | 5906
fan#6 (rpm)        | 5293
fan#7 (rpm)        | 6125
fan#8 (rpm)        | 5293
fan#9 (rpm)        | 5959
-------------------------------------------------

Ethernet Gateway

Skyway InfiniBand to Ethernet gateway…

  • MLXN-GW (gateway operating system) appliance
  • 16x ports (8 Infiniband EDR/HDR x 8 Ethernet 100/200Gb/s)
  • Max. bandwidth 1.6Tb/s
  • High-availability & load-Balancing

…achieved by leveraging Ethernet LAG (Link Aggregation). LACP (Link Aggregation Control Protocol) is used to establish the LAG and to verify connectivity…

Cables

Cable part numbers…

Cable Speed Type Split Length
MC2207130 FDR DAC no .5, 1, 1.5, 2
MC220731V FDR AOC no 3, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100
MCP1600-E EDR DAC no .5, 1, 1.5, 2, 2.5, 3, 4, 5
MFA1A00-E EDR AOC no 3, 5, 10, 15, 20, 30, 50, 100
MCP1650-H HDR DAC no .5, 1, 1.5, 2
MCP7H50-H HDR DAC yes 1, 1.5, 2
MCA1J00-H HDR ACC no 3, 4
MCA7J50-H HDR ACC yes 3, 4
MFS1S00-HxxxE HDR AOC no 3, 5, 10, 15, 20, 30, 50, 100, 130, 150
MFS1S50-HxxxE HDR AOC yes 3, 5, 10, 15, 20, 30

LinkX product family for Mellanox cables and transceivers

  • DAC, (passive) direct attach copper
    • low price
    • up to 2 meters (at HDR)
    • simple copper wires
    • no electronics
    • consume (almost) zero power
    • lowest latency
  • ACC, active copper cables (aka active DAC)
    • consumes 4 to 5 Watts
    • include signal-boosting integrated circuits (ICs)
    • extend the reach up to 4 meters (at 200G HDR)
  • AOC, active optical cables

DAC-in-a-Rack connect servers and storage to top-of-rack (TOR) switches

(passive/active) splitter cables

  • DAC/ACC
    • typically used to connect HDR100 HCAs to a HDR TOR switch
    • enabling a 40-port HDR switch to support 80-ports of 100G HDR100
    • 1:2 splitter breakout cable in DAC copper… (QSFP56 to 2xQSFP56)
  • AOC …1:2 splitter optical breakout cable… (QSFP56 to 2xQSFP56)

Firmware

MFT (Mellanox firmware tools)…

Installation …MLNX_OFED include the required packages…

dnf install -y mft kmod-kernel-mft-mlnx usbutils

…packages include an init-script…

systemctd start mst.service

Devices

…can be accessed by their PCI ID

# ...find PCI ID using lxpci
>>> lspci -d 15b3:
21:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]

# ...query the firmware on a device using the PCI ID
>>> mstflint -d 21:00.0 query
Image type:            FS4
FW Version:            20.32.1010

…when the IB driver is loaded…access a device by device name..

# ...find the device name
>>> ibv_devinfo | grep hca_id
hca_id: mlx5_0

# ...query the firmware on a device using the device name
>>> mstflint -d mlx5_0 query
...

PSID (Parameter-Set IDentification) of the channel adapter…

>>> mlxfwmanager --query | grep PSID
  PSID:             SM_2121000001000
  • …PSID used to download the correct firmware for a device
  • …start with MT_. SM_, or AS_ indicate vendor re-labeled cards

mlxconfig

Reboot for configuration changes to take effect

Change device configurations without reburning the firmware…

# ...only a single device is present...
mlxconfig query | grep LINK
         PHY_COUNT_LINK_UP_DELAY             DELAY_NONE(0)   
         LINK_TYPE_P1                        IB(1)           
         KEEP_ETH_LINK_UP_P1                 True(1)         
         KEEP_IB_LINK_UP_P1                  False(0)        
         KEEP_LINK_UP_ON_BOOT_P1             True(1)         
         KEEP_LINK_UP_ON_STANDBY_P1          False(0)        
         AUTO_POWER_SAVE_LINK_DOWN_P1        False(0)        
         UNKNOWN_UPLINK_MAC_FLOOD_P1         False(0)
# ...set configuration
mlxconfig -d $device set KEEP_IB_LINK_UP_P1=0 KEEP_LINK_UP_ON_BOOT_P1=1

Reset the device configuration to default…

mlxconfig -d $device reset

mlxfwmanager

Updating Firmware After Installation …requires a reboot:

>>> mlxfwmanager --online -u
#...
  PSID:             MT_0000000222
  Versions:         Current        Available     
     FW             20.32.1010     20.35.1012    
     PXE            3.6.0502       3.6.0804      
     UEFI           14.25.0017     14.28.0015
#...

Decouple from the login terminal (useful for automation over SSH)

log_file=/var/log/mlxfwmanager-$(date +%Y%m%d).log
nohup mlxfwmanager --no-progress --online --update --yes &>/dev/null >> $log_file & disown

mst

mst stops and starts the access driver for Linux

Example of updating the firmware on Super Micro boards:

>>> mst start && mst status -v
DEVICE_TYPE             MST                           PCI       RDMA    NET                 NUMA  
ConnectX2(rev:b0)       /dev/mst/mt26428_pciconf0     
ConnectX2(rev:b0)       /dev/mst/mt26428_pci_cr0      02:00.0   mlx4_0  net-ib0  

mlxcables

…work against the cables connected to the devices on the machine…

  • mst cable add…discover the cables that are connected to the local devices
  • mlxcables…access the cables…
    • …get cable IDs…
    • …upgrade firmware on the cables
>>> mlxcables -q
...
Cable name    : mt4123_pciconf0_cable_0
...
Identifier      : QSFP28 (11h)
Technology      : Copper cable unequalized (a0h)
Compliance      : 50GBASE-CR, ... HDR,EDR,FDR,QDR,DDR,SDR
...
Vendor          : Mellanox        
Serial number   : MT2214VS04725   
Part number     : MCP7H50-H01AR30 
...
Length [m]      : 1 m