InfiniBand: Mellanox Hardware & Firmware
Nvidia InfiniBand Networking Solutions
Switches
Switch | Config. | Ports | Speed |
---|---|---|---|
SB7800 | fixed | 36 | EDR |
QM87xx | fixed | 40 | HDR |
QS8500 | modular | 800+ | HDR |
QM97xx | fixed | 64 | NDR |
Switches come in to configurations…
- …fixed …number of port
- …modular …gradually expandable port modules
Switches come in two flavors…
- …managed …
- …MLXN OS features unlocked
- …access over SSH, SNMP, HTTPs
- …enables monitoring and configuration
- …unmanaged …
- …in-band management is possible
- …status via chassis LEDs
Get information from unmanaged switches with ibswinfo.sh
# requires MST service
>>> ./iwswinfo.sh -d lid-647
=================================================
Quantum Mellanox Technologies
=================================================
part number | MQM8790-HS2F
serial number | MT2202X19243
product name | Jaguar Unmng IB 200
revision | AK
ports | 80
PSID | MT_0000000063
GUID | 0x1070fd030003af98
firmware version | 27.2008.3328
-------------------------------------------------
uptime (d-h:m:s) | 26d-20:16:01
-------------------------------------------------
PSU0 status | OK
P/N | MTEF-PSF-AC-C
S/N | MT2202X18887
DC power | OK
fan status | OK
power (W) | 165
PSU1 status | OK
P/N | MTEF-PSF-AC-C
S/N | MT2202X18881
DC power | OK
fan status | OK
power (W) | 148
-------------------------------------------------
temperature (C) | 63
max temp (C) | 63
-------------------------------------------------
fan status | OK
fan#1 (rpm) | 5959
fan#2 (rpm) | 5251
fan#3 (rpm) | 6013
fan#4 (rpm) | 5251
fan#5 (rpm) | 5906
fan#6 (rpm) | 5293
fan#7 (rpm) | 6125
fan#8 (rpm) | 5293
fan#9 (rpm) | 5959
-------------------------------------------------
Ethernet Gateway
Skyway InfiniBand to Ethernet gateway…
- MLXN-GW (gateway operating system) appliance
- 16x ports (8 Infiniband EDR/HDR x 8 Ethernet 100/200Gb/s)
- Max. bandwidth 1.6Tb/s
- High-availability & load-Balancing
…achieved by leveraging Ethernet LAG (Link Aggregation). LACP (Link Aggregation Control Protocol) is used to establish the LAG and to verify connectivity…
Cables
Cable part numbers…
Cable | Speed | Type | Split | Length |
---|---|---|---|---|
MC2207130 | FDR | DAC | no | .5, 1, 1.5, 2 |
MC220731V | FDR | AOC | no | 3, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100 |
MCP1600-E | EDR | DAC | no | .5, 1, 1.5, 2, 2.5, 3, 4, 5 |
MFA1A00-E | EDR | AOC | no | 3, 5, 10, 15, 20, 30, 50, 100 |
MCP1650-H | HDR | DAC | no | .5, 1, 1.5, 2 |
MCP7H50-H | HDR | DAC | yes | 1, 1.5, 2 |
MCA1J00-H | HDR | ACC | no | 3, 4 |
MCA7J50-H | HDR | ACC | yes | 3, 4 |
MFS1S00-HxxxE | HDR | AOC | no | 3, 5, 10, 15, 20, 30, 50, 100, 130, 150 |
MFS1S50-HxxxE | HDR | AOC | yes | 3, 5, 10, 15, 20, 30 |
LinkX product family for Mellanox cables and transceivers
- DAC, (passive) direct attach copper
- low price
- up to 2 meters (at HDR)
- simple copper wires
- no electronics
- consume (almost) zero power
- lowest latency
- ACC, active copper cables (aka active DAC)
- consumes 4 to 5 Watts
- include signal-boosting integrated circuits (ICs)
- extend the reach up to 4 meters (at 200G HDR)
- AOC, active optical cables
DAC-in-a-Rack connect servers and storage to top-of-rack (TOR) switches
(passive/active) splitter cables…
- DAC/ACC
- typically used to connect HDR100 HCAs to a HDR TOR switch
- enabling a 40-port HDR switch to support 80-ports of 100G HDR100
- 1:2 splitter breakout cable in DAC copper… (QSFP56 to 2xQSFP56)
- AOC …1:2 splitter optical breakout cable… (QSFP56 to 2xQSFP56)
Firmware
MFT (Mellanox firmware tools)…
- Interface with the HCA firmware…
- …query firmware information
- …customize firmware images
- …burn firmware image to a device
- Configuration…
/etc/mft
- References
Installation …MLNX_OFED include the required packages…
dnf install -y mft kmod-kernel-mft-mlnx usbutils
…packages include an init-script…
systemctd start mst.service
Devices
…can be accessed by their PCI ID
# ...find PCI ID using lxpci
>>> lspci -d 15b3:
21:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
# ...query the firmware on a device using the PCI ID
>>> mstflint -d 21:00.0 query
Image type: FS4
FW Version: 20.32.1010
…when the IB driver is loaded…access a device by device name..
# ...find the device name
>>> ibv_devinfo | grep hca_id
hca_id: mlx5_0
# ...query the firmware on a device using the device name
>>> mstflint -d mlx5_0 query
...
PSID (Parameter-Set IDentification) of the channel adapter…
>>> mlxfwmanager --query | grep PSID
PSID: SM_2121000001000
- …PSID used to download the correct firmware for a device
- …start with
MT_
.SM_
, orAS_
indicate vendor re-labeled cards
mlxconfig
Reboot for configuration changes to take effect
Change device configurations without reburning the firmware…
# ...only a single device is present...
mlxconfig query | grep LINK
PHY_COUNT_LINK_UP_DELAY DELAY_NONE(0)
LINK_TYPE_P1 IB(1)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 False(0)
KEEP_LINK_UP_ON_BOOT_P1 True(1)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P1 False(0)
UNKNOWN_UPLINK_MAC_FLOOD_P1 False(0)
# ...set configuration
mlxconfig -d $device set KEEP_IB_LINK_UP_P1=0 KEEP_LINK_UP_ON_BOOT_P1=1
Reset the device configuration to default…
mlxconfig -d $device reset
mlxfwmanager
Updating Firmware After Installation …requires a reboot:
>>> mlxfwmanager --online -u
#...
PSID: MT_0000000222
Versions: Current Available
FW 20.32.1010 20.35.1012
PXE 3.6.0502 3.6.0804
UEFI 14.25.0017 14.28.0015
#...
Decouple from the login terminal (useful for automation over SSH)
log_file=/var/log/mlxfwmanager-$(date +%Y%m%d).log
nohup mlxfwmanager --no-progress --online --update --yes &>/dev/null >> $log_file & disown
mst
mst
stops and starts the access driver for Linux
Example of updating the firmware on Super Micro boards:
>>> mst start && mst status -v
DEVICE_TYPE MST PCI RDMA NET NUMA
ConnectX2(rev:b0) /dev/mst/mt26428_pciconf0
ConnectX2(rev:b0) /dev/mst/mt26428_pci_cr0 02:00.0 mlx4_0 net-ib0
mlxcables
…work against the cables connected to the devices on the machine…
mst cable add
…discover the cables that are connected to the local devicesmlxcables
…access the cables…- …get cable IDs…
- …upgrade firmware on the cables
>>> mlxcables -q
...
Cable name : mt4123_pciconf0_cable_0
...
Identifier : QSFP28 (11h)
Technology : Copper cable unequalized (a0h)
Compliance : 50GBASE-CR, ... HDR,EDR,FDR,QDR,DDR,SDR
...
Vendor : Mellanox
Serial number : MT2214VS04725
Part number : MCP7H50-H01AR30
...
Length [m] : 1 m
mlxlink
mlxlink
…check and debug link status
>>> mlxlink -d mlx5_0 --show_device
Operational Info
----------------
State : Active
Physical state : LinkUp
Speed : IB-SDR
Width : 2x
FEC : No FEC
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
--------------
Enabled Link Speed : 0x00000001 (SDR)
Supported Cable Speed : 0x0000007f (HDR,EDR,FDR,FDR10,QDR,DDR,SDR)
...