Identifying a service action by using system event logs

Use the Intelligent Platform Management Interface (IPMI) program to examine system event logs (SELs) to identify a service action.

  1. Use the ipmitool command to examine SELs.
    • To list SELs by using an in-band network, use the following command:

      ipmitool sel elist

    • To list SELs remotely over the LAN, use the following command:
      ipmitool -I lanplus -U <username> -P <password> -H <BMC IP addres or BMC hostname> sel elist
  2. Scan the SELs for an event with the value OEM record de. Did you find a SEL with the value OEM record de?
    If Then
    Yes: Continue with the next step.
    No Go to step 4.
  3. The OEM record de specific log information is indicated by the rightmost digits of the SEL with the value OEM record de. Use Table 1 to determine the service action to perform.
    Table 1. OEM record de specific log information and service action
    OEM record de specific log information Service action
    00xxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support.
    01xxxxxxxxxx Go to the EPUB_PRC_FIND_DECONFIGURE_PART isolation procedure.
    04xxxxxxxxxx Go to the EPUB_PRC_SP_CODE isolation procedure.
    05xxxxxxxxxx Go to the EPUB_PRC_PHYP_CODE isolation procedure.
    08xxxxxxxxxx Go to the EPUB_PRC_ALL_PROCS isolation procedure.
    09xxxxxxxxxx Go to the EPUB_PRC_ALL_MEMCRDS isolation procedure.
    0Axxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support.
    10xxxxxxxxxx Go to the EPUB_PRC_LVL_SUPPORT isolation procedure.
    16xxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support.
    1Cxxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support.
    22xxxxxxxxxx Go to the EPUB_PRC_MEMORY_PLUGGING_ERROR isolation procedure.
    2Dxxxxxxxxxx Go to the EPUB_PRC_FSI_PATH isolation procedure.
    30xxxxxxxxxx Go to the EPUB_PRC_PROC_AB_BUS isolation procedure.
    31xxxxxxxxxx Go to the EPUB_PRC_PROC_XYZ_BUS isolation procedure.
    34xxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support.
    37xxxxxxxxxx Go to the EPUB_PRC_EIBUS_ERROR isolation procedure.
    3Fxxxxxxxxxx Go to the EPUB_PRC_POWER_ERROR isolation procedure.
    4Dxxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support.
    4Fxxxxxxxxxx Go to the EPUB_PRC_MEMORY_UE isolation procedure.
    55xxxxxxxxxx Go to the EPUB_PRC_HB_CODE isolation procedure.
    56xxxxxxxxxx Go to the EPUB_PRC_TOD_CLOCK_ERR isolation procedure.
    5Cxxxxxxxxxx Go to the EPUB_PRC_COOLING_SYSTEM_ERR isolation procedure.
    5Exxxxxxxxxx Go to the EPUB_PRC_GPU_ISOLATION_PROCEDURE isolation procedure.
    This ends the procedure.
  4. Scan the SELs for an event with the value OEM record df. Did you find a SEL with the value OEM record df?
    If Then
    Yes: Continue with the next step.
    No Go to step 10.
  5. One or more events might be logged around the same time as the event with the value OEM record df. These events require a service action if they meet the following criteria:
    • A service action keyword is present. For a list of service action keywords, see Identifying service action keywords in system event logs.
    • Asserted is in the description.
    • OEM record is not in the description.
    • The event has a time stamp in close proximity to the time stamp of the event with the value OEM record df.
  6. Did you find any SEL events that require a service action as defined in step 5?
    If Then
    Yes: Continue with the next step.
    No: Go to Collecting diagnostic data. Then, go to Contacting IBM service and support.
  7. Did you find only one SEL event that requires a service action as defined in step 5?
    If Then
    Yes: Continue with the next step.
    No: Go to step 9.
  8. Record the SEL record ID for the event you identified in step 5. The SEL record ID is indicated by the leftmost digits of the SEL. Use the ipmitool command to display the SEL details.
    • To display SEL details by using an in-band network, use the following command:

      ipmitool sel get <SEL record ID>

      Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
    • To display SEL details remotely over the LAN, use the following command:

      ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname> sel get <SEL record ID>

      Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
    The sensor ID field contains sensor information in the format sensor name (sensor ID). Record the sensor name, sensor ID, and event description. Then, use the following information to determine the service action to perform:

    This ends the procedure.

  9. You identified more than one event in step 5. The service actions for all of the events that were identified in step 5 must be performed to successfully complete the repair. Record the SEL record IDs for the events that you identified in step 5. The SEL record ID is indicated by the leftmost digits of the SEL. Use the ipmitool command to display SEL details for each SEL record ID that you recorded.
    • To display SEL details by using an in-band network, use the following command:

      ipmitool sel get <SEL record ID>

      Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
    • To display SEL details remotely over the LAN, use the following command:

      ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname> sel get <SEL record ID>

      Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
    The sensor ID field contains sensor information in the format sensor name (sensor ID). Record the sensor name, sensor ID, and event description. Then, use this information to determine the service action to perform:

    This ends the procedure.

  10. Scan the SEL for an event with the value OEM record c0.
  11. Did you find an event with the value OEM record c0?
    If Then
    Yes: Continue with the next step.
    No: Go to step 13.
  12. The OEM record c0 specific log information is indicated by the rightmost digits of the SEL with the value OEM record c0. If your system is an 8335-GCA or 8335-GTA, use Table 2 to determine the service action to perform. If your system is an 8335-GTB, use Table 3 to determine the service action to perform. If your system is an 8348-21C, use Table 4 to determine the service action to perform.
    Table 2. OEM record c0 specific log information, description, and service action for an 8335-GCA or 8335-GTA
    OEM record c0 specific log information Description Service action
    320a01xxxxxx Phy read failure If you are viewing this event from the BMC, the missing or defective cable is now operational and no service action is required. Otherwise, replace the missing or failed LAN cable that attaches the console to the system.
    320a02xxxxxx Phy speed and duplex failure
    320exxxxxxxx OCC reset required This event is for information only. No service action is required.
    3a0400xxxxxx Chassis soft power off A user initiated power off request occurred. No service action is required.
    3a0402xxxxxx Chassis soft reboot
    3a0701xxxxxx Request for PNOR access This event is for information only. No service action is required.
    3a0702xxxxxx Release of PNOR access
    3a1100xxxxxx Fan thread stopped
    3a1101xxxxxx Fan thread started
    3a1503xxxxxx Primary side boot failed Go to Resolving a system firmware boot failure.
    3a1504xxxxxx Golden side boot failed Go to Resolving a system firmware boot failure.
    3a1601xxxxxx Fan 1 failure Replace Fan 1. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure.
    3a1602xxxxxx Fan 2 failure Replace Fan 2. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure.
    3a1603xxxxxx Fan 3 failure Replace Fan 3. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure.
    3a1604xxxxxx Fan 4 failure Replace Fan 4. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure.
    3a260xyyyyyy, where x = 1, 2, or 3 System shut down due to one or more missing or failed fans The OEM record c0 specific log information is 3a260xyyyyyy, where x is the number of fans that were missing or failed when the system was shut down. The system cannot be powered on with missing fans. If any SEL events were logged with OEM record c0 specific log information 3a16xxxxxxxx, complete the service action indicated in this table. Otherwise, replace the fans, one at a time, until the problem is resolved. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure.
    3a2604yyyyyy All of the fans are missing or failed Ensure that the fan power cable and the disk and fan signal cable are seated properly. If the problem persists, replace the following items, one at a time, until the problem is resolved:
    Note: Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure.
    • Power riser with time-of-day battery slot
    • Fan power cable
    • Disk and fan signal cable
    • Disk drive and fan card
    Table 3. OEM record c0 specific log information, description, and service action for an 8335-GTB
    OEM record c0 specific log information Description Service action
    320a01xxxxxx Phy read failure If you are viewing this event from the BMC, the missing or defective cable is now operational and no service action is required. Otherwise, replace the missing or failed LAN cable that attaches the console to the system.
    320a02xxxxxx Phy speed and duplex failure
    320exxxxxxxx OCC reset required This event is for information only. No service action is required.
    3a0400xxxxxx Chassis soft power off A user initiated power off request occurred. No service action is required.
    3a0402xxxxxx Chassis soft reboot
    3a0701xxxxxx Request for PNOR access This event is for information only. No service action is required.
    3a0702xxxxxx Release of PNOR access
    3a1100xxxxxx Fan thread stopped
    3a1101xxxxxx Fan thread started
    3a1503xxxxxx Primary side boot failed Go to Resolving a system firmware boot failure.
    3a1504xxxxxx Golden side boot failed Go to Resolving a system firmware boot failure.
    3a1601xxxxxx Fan 1 failure Replace Fan 1. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure.
    3a1602xxxxxx Fan 2 failure Replace Fan 2. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure.
    3a1603xxxxxx Fan 3 failure Replace Fan 3. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure.
    3a1604xxxxxx Fan 4 failure Replace Fan 4. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure.
    3a2600xxxxxx The water-cooled system shut down due to too many processor core sensors reading a temperature at or above the maximum temperature that is allowed. At least one processor is over heating. Go to Resolving an over temperature problem for a water-cooled 8335-GTB system.
    3a260xyyyyyy, where x = 1, 2, or 3 System shut down due to one or more missing or failed fan The OEM record c0 specific log information is 3a260xyyyyyy where x is the number of fans that were missing or failed when the system was shut down. The system cannot be powered on with missing fans. If any SEL events were logged with OEM record c0 specific log information 3a16xxxxxxxx, complete the service action indicated in this table. Otherwise, replace the fans, one at a time, until the problem is resolved. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure.
    3a2604yyyyyy All of the fans are missing or failed Ensure that the fan power cable and the disk and fan signal cable are seated properly. If the problem persists, replace the following items, one at a time, until the problem is resolved:
    Note: Go to 8335-GTB locations to identify the physical location and removal and replacement procedure.
    • Power riser with time-of-day battery slot
    • Fan power cable
    • Disk and fan signal cable
    • Disk drive and fan card
    Table 4. OEM record c0 specific log information, description, and service action for an 8348-21C
    OEM record c0 specific log information Description Service action
    320a01xxxxxx Phy read failure If you are viewing this event from the BMC, the missing or defective cable is now operational and no service action is required. Otherwise, replace the missing or failed LAN cable that attaches the console to the system.
    320a02xxxxxx Phy speed and duplex failure
    320exxxxxxxx OCC reset required This event is for information only. No service action is required.
    3a0400xxxxxx Chassis soft power off A user initiated power off request occurred. No service action is required.
    3a0402xxxxxx Chassis soft reboot
    3a0701xxxxxx Request for PNOR access This event is for information only. No service action is required.
    3a0702xxxxxx Release of PNOR access
    3a1100xxxxxx Fan thread stopped
    3a1101xxxxxx Fan thread started
    3a1503xxxxxx Primary side boot failed Go to Resolving a system firmware boot failure.
    3a1504xxxxxx Golden side boot failed Go to Resolving a system firmware boot failure.
    3a1601xxxxxx Fan 1 failure Replace Fan 1. Go to 8348-21C locations to identify the physical location and removal and replacement procedure.
    3a1602xxxxxx Fan 2 failure Replace Fan 2. Go to 8348-21C locations to identify the physical location and removal and replacement procedure.
    3a1603xxxxxx Fan 3 failure Replace Fan 3. Go to 8348-21C locations to identify the physical location and removal and replacement procedure.
    3a1604xxxxxx Fan 4 failure Replace Fan 4. Go to 8348-21C locations to identify the physical location and removal and replacement procedure.
    3a1605xxxxxx Fan 5 failure Replace Fan 5. Go to 8348-21C locations to identify the physical location and removal and replacement procedure.
    3a260xyyyyyy, where x = 1, 2, 3, or 4 System shut down due to one or more missing or failed fans The OEM record c0 specific log information is 3a260xyyyyyy, where x is the number of fans that were missing or failed when the system was shut down. The system cannot be powered on with missing or failed fans. If any SEL events were logged with OEM record c0 specific log information 3a16xxxxxxxx, complete the service action indicated in this table. Otherwise, replace the fans, one at a time, until the problem is resolved. Go to 8348-21C locations to identify the physical location and removal and replacement procedure.
    3a2605yyyyyy All of the fans are missing or failed Replace the disk drive backplane. Go to 8348-21C locations to identify the physical location and removal and replacement procedure.
  13. One or more SEL events might require a service action. These events require a service action if they meet the following criteria:
  14. Did you find one or more SEL events that require a service action as defined in step 13?
    If Then
    Yes: Continue with the next step.
    No: This ends the procedure.
  15. The service actions for all of the events that were identified in step 13 must be performed to successfully complete the repair. Record the SEL record IDs for the events that you identified in step 13. The SEL record ID is indicated by the leftmost digits of the SEL. Use the ipmitool command to display SEL details for each SEL record ID that you recorded.
    • To display SEL details by using an in-band network, use the following command:

      ipmitool sel get <SEL record ID>

      Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
    • To display SEL details remotely over the LAN, use the following command:

      ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname> sel get <SEL record ID>

      Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
    The sensor ID field contains sensor information in the format sensor name (sensor ID). Record the sensor name, sensor ID, and event description. Then, use this information to determine the service action to perform:

    This ends the procedure.




Last updated: Thu, December 02, 2021