InfiniBand: Mellanox Hardware & Firmware
Nvidia InfiniBand Networking Solutions
Switches
Switch | Config. | Ports | Speed |
---|---|---|---|
SB7800 | fixed | 36 | EDR |
QM87xx | fixed | 40 | HDR |
QS8500 | modular | 800+ | HDR |
QM97xx | fixed | 64 | NDR |
Switches come in to configurations…
- …fixed …number of port
- …modular …gradually expandable port modules
Switches come in two flavors…
- …managed …
- …MLXN OS features unlocked
- …access over SSH, SNMP, HTTPs
- …enables monitoring and configuration
- …unmanaged …
- …in-band management is possible
- …status via chassis LEDs
Get information from unmanaged switches with ibswinfo.sh
# requires MST service
>>> ./iwswinfo.sh -d lid-647
=================================================
Quantum Mellanox Technologies
=================================================
part number | MQM8790-HS2F
serial number | MT2202X19243
product name | Jaguar Unmng IB 200
revision | AK
ports | 80
PSID | MT_0000000063
GUID | 0x1070fd030003af98
firmware version | 27.2008.3328
-------------------------------------------------
uptime (d-h:m:s) | 26d-20:16:01
-------------------------------------------------
PSU0 status | OK
P/N | MTEF-PSF-AC-C
S/N | MT2202X18887
DC power | OK
fan status | OK
power (W) | 165
PSU1 status | OK
P/N | MTEF-PSF-AC-C
S/N | MT2202X18881
DC power | OK
fan status | OK
power (W) | 148
-------------------------------------------------
temperature (C) | 63
max temp (C) | 63
-------------------------------------------------
fan status | OK
fan#1 (rpm) | 5959
fan#2 (rpm) | 5251
fan#3 (rpm) | 6013
fan#4 (rpm) | 5251
fan#5 (rpm) | 5906
fan#6 (rpm) | 5293
fan#7 (rpm) | 6125
fan#8 (rpm) | 5293
fan#9 (rpm) | 5959
-------------------------------------------------
Ethernet Gateway
Skyway InfiniBand to Ethernet gateway…
- MLXN-GW (gateway operating system) appliance
- 16x ports (8 Infiniband EDR/HDR x 8 Ethernet 100/200Gb/s)
- Max. bandwidth 1.6Tb/s
- High-availability & load-Balancing
…achieved by leveraging Ethernet LAG (Link Aggregation). LACP (Link Aggregation Control Protocol) is used to establish the LAG and to verify connectivity…
Cables
Cable part numbers…
Cable | Speed | Type | Split | Length |
---|---|---|---|---|
MC2207130 | FDR | DAC | no | .5, 1, 1.5, 2 |
MC220731V | FDR | AOC | no | 3, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100 |
MCP1600-E | EDR | DAC | no | .5, 1, 1.5, 2, 2.5, 3, 4, 5 |
MFA1A00-E | EDR | AOC | no | 3, 5, 10, 15, 20, 30, 50, 100 |
MCP1650-H | HDR | DAC | no | .5, 1, 1.5, 2 |
MCP7H50-H | HDR | DAC | yes | 1, 1.5, 2 |
MCA1J00-H | HDR | ACC | no | 3, 4 |
MCA7J50-H | HDR | ACC | yes | 3, 4 |
MFS1S00-HxxxE | HDR | AOC | no | 3, 5, 10, 15, 20, 30, 50, 100, 130, 150 |
MFS1S50-HxxxE | HDR | AOC | yes | 3, 5, 10, 15, 20, 30 |
LinkX product family for Mellanox cables and transceivers
- DAC, (passive) direct attach copper
- low price
- up to 2 meters (at HDR)
- simple copper wires
- no electronics
- consume (almost) zero power
- lowest latency
- ACC, active copper cables (aka active DAC)
- consumes 4 to 5 Watts
- include signal-boosting integrated circuits (ICs)
- extend the reach up to 4 meters (at 200G HDR)
- AOC, active optical cables
DAC-in-a-Rack connect servers and storage to top-of-rack (TOR) switches
(passive/active) splitter cables…
- DAC/ACC
- typically used to connect HDR100 HCAs to a HDR TOR switch
- enabling a 40-port HDR switch to support 80-ports of 100G HDR100
- 1:2 splitter breakout cable in DAC copper… (QSFP56 to 2xQSFP56)
- AOC …1:2 splitter optical breakout cable… (QSFP56 to 2xQSFP56)
Firmware
MFT (Mellanox firmware tools)…
- Interface with the HCA firmware…
- …query firmware information
- …customize firmware images
- …burn firmware image to a device
- Configuration…
/etc/mft
- References
Installation …MLNX_OFED include the required packages…
dnf install -y mft kmod-kernel-mft-mlnx usbutils
…packages include an init-script…
systemctd start mst.service
Devices
…can be accessed by their PCI ID
# ...find PCI ID using lxpci
>>> lspci -d 15b3:
21:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
# ...query the firmware on a device using the PCI ID
>>> mstflint -d 21:00.0 query
Image type: FS4
FW Version: 20.32.1010
…when the IB driver is loaded…access a device by device name..
# ...find the device name
>>> ibv_devinfo | grep hca_id
hca_id: mlx5_0
# ...query the firmware on a device using the device name
>>> mstflint -d mlx5_0 query
...
PSID (Parameter-Set IDentification) of the channel adapter…
>>> mlxfwmanager --query | grep PSID
PSID: SM_2121000001000
- …PSID used to download the correct firmware for a device
- …start with
MT_
.SM_
, orAS_
indicate vendor re-labeled cards
mlxconfig
Reboot for configuration changes to take effect
Change device configurations without reburning the firmware…
# ...only a single device is present...
mlxconfig query | grep LINK
PHY_COUNT_LINK_UP_DELAY DELAY_NONE(0)
LINK_TYPE_P1 IB(1)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 False(0)
KEEP_LINK_UP_ON_BOOT_P1 True(1)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
AUTO_POWER_SAVE_LINK_DOWN_P1 False(0)
UNKNOWN_UPLINK_MAC_FLOOD_P1 False(0)
# ...set configuration
mlxconfig -d $device set KEEP_IB_LINK_UP_P1=0 KEEP_LINK_UP_ON_BOOT_P1=1
Reset the device configuration to default…
mlxconfig -d $device reset
Upgrade Firmware
mlxfwmanager
— Update HCA firmware1
# get the current firmware version
mlxfwmanager --query | grep FW
# update the firmware
>>> mlxfwmanager --online -u
#...
PSID: MT_0000000222
Versions: Current Available
FW 20.32.1010 20.35.1012
PXE 3.6.0502 3.6.0804
UEFI 14.25.0017 14.28.0015
#...
Decouple from the login terminal (useful for automation over SSH)
log_file=/var/log/mlxfwmanager-$(date +%Y%m%d).log
nohup mlxfwmanager --no-progress --online --update --yes \
&>/dev/null >> $log_file & disown
Downgrade Firmware
Identify the device and download the correct firmware2 archive:
# identify the device
>>> mlxfwmanager --query | grep -e Part -e PSID
Part Number: MCX653105A-ECA_Ax
PSID: MT_0000000222
# download the firmware
>>> wget https://www.mellanox.com/downloads/firmware/fw-ConnectX6-rel-20_41_1000-MCX653105A-ECA_Ax-UEFI-14.34.12-FlexBoot-3.7.400.bin.zip
# extract the archive
>>> unzip fw-ConnectX6-rel-20_41_1000-MCX653105A-ECA_Ax-UEFI-14.34.12-FlexBoot-3.7.400.bin.zi
Identify the device path…
- …using
mst
…make sure to select to correct device if there are multiple - …format of device name is
/dev/mst/mt<dev_id>_pci{_cr0|conf0}
>>> mst start
# find the device name under the column MST
>>> mst status -v
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
ConnectX6(rev:0) /dev/mst/mt4123_pciconf0 81:00.0 mlx5_0 net-ib0 1
Burn the firmware image:
>>> flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel-20_41_1000-MCX653105A-ECA_Ax-UEFI-14.34.12-FlexBoot-3.7.400.bin burn
Current FW version on flash: 20.43.1014
New FW version: 20.41.1000
Note: The new FW version is older than the current FW version on flash.
Do you want to continue ? (y/n) [n] : y
FSMST_INITIALIZE - OK
Writing Boot image component - OK
Restoring signature - OK
-I- To load new FW run mlxfwreset or reboot machine.
mlxcables
…work against the cables connected to the devices on the machine…
mst cable add
…discover the cables that are connected to the local devicesmlxcables
…access the cables…- …get cable IDs…
- …upgrade firmware on the cables
>>> mlxcables -q
...
Cable name : mt4123_pciconf0_cable_0
...
Identifier : QSFP28 (11h)
Technology : Copper cable unequalized (a0h)
Compliance : 50GBASE-CR, ... HDR,EDR,FDR,QDR,DDR,SDR
...
Vendor : Mellanox
Serial number : MT2214VS04725
Part number : MCP7H50-H01AR30
...
Length [m] : 1 m
mlxlink
mlxlink
…check and debug link status
>>> mlxlink -d mlx5_0 --show_device
Operational Info
----------------
State : Active
Physical state : LinkUp
Speed : IB-SDR
Width : 2x
FEC : No FEC
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
--------------
Enabled Link Speed : 0x00000001 (SDR)
Supported Cable Speed : 0x0000007f (HDR,EDR,FDR,FDR10,QDR,DDR,SDR)
...
Footnotes
Firmware Downloads, Nvidia
https://network.nvidia.com/support/firmware↩︎Firmware Downloads, Nvidia
https://network.nvidia.com/support/firmware↩︎