IPMI Platform Management

Hardware
Published

January 28, 2016

Modified

August 1, 2024

Platform Management…hardware monitoring & control functions…

BMC (Baseboard Management Controller)…

OpenIPMI Linux device drivers, and library (kernel >=3.10 auto-load modules)

IPMItool and GNU FreeIPMI, CLI interfaces (users IPMI-over-LAN, OpenIPMI)

Browser support for WebUI…

apt install openipmi freeipmi-tools ipmitool
lsmod | grep ipmi                                  # list IPMI kernel modules
ipmitool -I lanplus -U ADMIN -a sol activate -H <ip>
ipmi-console -u ADMIN -h <ip>                      # serial over LAN (SOL) console

Configuration

ipmitool mc info                                    # BMC spec
ipmitool user list 1                                # list user accounts
ipmitool user set name <id> <name>                  # add user 
ipmitool user set password <id>                     # set user password                         
ipmitool lan print | grep -e 'IP Address' -e 'MAC Address' -e 'Gateway IP'
                                                    # show BMC network configuration
ipmitool lan set 1 ipsrc static                     # set static network configuration
ipmitool lan set 1 ipaddr 10.1.2.3                  
ipmitool lan set 1 netmask 255.255.255.0
ipmitool lan set 1 defgw ipaddr 10.1.0.1
ipmitool raw 0x30 0x45 0                            # get current fan mode
ipmitool raw 0x30 0x45 1 <NUM>                      # sst fand speed to mode with number

Operation

ipmitool mc reset cold                             # reset BMC modules
ipmitool chassis status                            # power state
ipmitool chassis power off|on                      
ipmitool chassis reset                             # hot reset
ipmitool chassis cycle                             # cold reset (off, wait 1sec, on)
nodeset-ipmi() { /usr/sbin/ipmipower -u ADMIN -h $(NODES).mng.devops.test $@ }
nodeset-ipmi -P -[n,f,c,r,s]                       # prompt for password, and
                                                   # o(n) – of(f) – (c)ycle – (r)eset – (s)tat
nodeset-ipmi -s -p [] | grep off | cut -d. -f1 | cut -d- -f2 | nodeset -f
                                                   # find nodes in state off

Sensors

Sensors collect data about temperatures and voltages, fan status. Sensors are classified according to the type of readings they provide and/or the type of events they generate. A sensor can return either an analog or discrete reading. Sensor events can be discrete or threshold-based. Sensors and their events are represented using numeric codes defined in the IPMI specification. Print a list of sensors including their sensor number with ipmi-sensors:

>>> ipmi-sensors -vv | grep -e 'Record ID' -e 'ID String' -e 'Sensor Number'
[…]
ID String: System Temp
Sensor Number: 3
Record ID: 205
[…]

Show a specific sensor record with option --record-ids=:

>>> ipmi-sensors --record-ids=205
ID  | Name        | Type        | Reading    | Units | Event
205 | System Temp | Temperature | 26.00      | C     | 'OK'

Threshold

Threshold are analog temperature values from sensors with corresponding boundaries. Six distinguished thresholds are configurable used to individually emit an event eventually. The command ipmitool provides an overview of the thresholds:

>>> ipmi-sensors --output-sensor-thresholds --record-ids=205

>>> ipmi-sensors -t temperature --output-sensor-thresholds

>>> ipmitool sensor get "System Temp"
Locating sensor record...
Sensor ID              : System Temp (0xb)
 Entity ID             : 7.1
 Sensor Type (Analog)  : Temperature
 Sensor Reading        : 30 (+/- 0) degrees C
 Status                : ok
 Lower Non-Recoverable : -9.000
 Lower Critical        : -7.000
 Lower Non-Critical    : -5.000
 Upper Non-Critical    : 60.000
 Upper Critical        : 65.000
 Upper Non-Recoverable : 80.000
 Assertion Events      : 
 Assertions Enabled    : lcr- lnr- ucr+ unr+ 
 Deassertions Enabled  : lcr- lnr- ucr+ unr+

ipmitool can be used to set the thresholds also:

>>> ipmitool sensor thresh 'System Temp' upper 60 65 80
Locating sensor record 'System Temp'...
Setting sensor "System Temp" Upper Non-Critical threshold to 60.000
Setting sensor "System Temp" Upper Critical threshold to 65.000
Setting sensor "System Temp" Upper Non-Recoverable threshold to 80.000

>>> ipmitool sensor thresh 'System Temp' ucr 59
Locating sensor record 'System Temp'...
Setting sensor "System Temp" Upper Critical threshold to 59.000

Events

The command ipmi-sensors-config shows the conditions emitting an event:

>>> ipmi-sensors-config --listsections
[…]
>>> ipmi-sensors-config --checkout --section 205_System_Temp | grep -v '#'
[…]
Enable_All_Event_Messages                                                   Yes
Enable_Scanning_On_This_Sensor                                              Yes
Enable_Assertion_Event_Lower_Critical_Going_Low                             Yes
Enable_Assertion_Event_Lower_Non_Recoverable_Going_Low                      Yes
Enable_Assertion_Event_Upper_Critical_Going_High                            Yes
Enable_Assertion_Event_Upper_Non_Recoverable_Going_High                     Yes
Enable_Deassertion_Event_Lower_Critical_Going_Low                           Yes
Enable_Deassertion_Event_Lower_Non_Recoverable_Going_Low                    Yes
Enable_Deassertion_Event_Upper_Critical_Going_High                          Yes
Enable_Deassertion_Event_Upper_Non_Recoverable_Going_High                   Yes
[…]

Logs

Event messages are sent by the BMC when significant or critical system management events are detected. Critical events should be captured by System Event Log (SEL), if they are required for ‘post-mortem’ analysis or autonomous system response (e.g. power off). Critical events include:

  • Temperature threshold exceeded
  • Voltage threshold exceeded
  • Power or fan fault
  • Interrupts/signals that affect system operation (NMIs, PCI PERR (parity error), and SERR (system error)
  • Events that impact system data integrity e.g. uncorrectable ECC errors, system security (chassis intrusion)

Filters

Platform Event Filtering (PEF)

  • …mechanism to take selected actions on event messages
  • …event filtering is independent of event logging
  • Event Filter Table
    • …configures system actions to perform on a given event (like power-off)
    • …specification recommends to support at least 16 entries
    • …subset of these entries should be pre-configured for common system failures
  • Alert Policy Table
    • …configures event messages forwarding to a given destination
    • …typically another service in the LAN
# list PEF configuration sections
ipmi-config -g pef -L
#..or...
ipmi-pef-config -L

Output a specific event with option --section

# For example
>>> ipmi-pef-config -L | grep -i event
Event_Filter_1
Event_Filter_2

# ...print a specific section
>>> ipmi-pef-config -o -S Event_Filter_1

Retrieve the currently running configuration from the BMC with the command option --checkout. Configuration modifications can be loaded into the BMC with option --commit --filename=PATH providing a file argument.

ipmi-pef-config --checkout > ipmi-pef.conf
ipmi-pef-config --commit --filename=ipmi-pef.conf

Single event configuration is possible with the options --commit --key-pair="SECTION:KEY=VALUE", where section name and configuration key are delimited by a double point:

ipmi-pef-config -c -e Event_Filter_16:Sensor_Type=Temperature
ipmi-pef-config -c -e Event_Filter_16:Event_Severity=Critical
ipmi-pef-config -c -e Event_Filter_16:Event_Filter_Action_Power_Off=yes
ipmi-pef-config -c -e Event_Filter_16:Enable_Filter=yes
ipmi-pef-config -c -e Event_Filter_16:Generator_Id_Byte_1=0xFF
ipmi-pef-config -c -e Event_Filter_16:Generator_Id_Byte_2=0xFF
ipmi-pef-config -c -e Event_Filter_16:Sensor_Number=0xFF
ipmi-pef-config -c -e Event_Filter_16:Event_Trigger=0xFF
ipmi-pef-config -c -e Event_Filter_16:Event_Data1_Offset_Mask=0xFFFF
ipmi-pef-config -c -e Event_Filter_16:Event_Data1_Compare1=0xFF
ipmi-pef-config -c -e Event_Filter_16:Event_Data2_Compare1=0xFF
ipmi-pef-config -c -e Event_Filter_16:Event_Data3_Compare1=0xFF
ipmi-pef-config -c -e PEF_Conf:Enable_Power_Down_Action=Yes