Identifying a service action by using system event logs
Use the Intelligent Platform Management Interface (IPMI) program to examine system event logs (SELs) to identify a service action.
- Use the ipmitool command to examine SELs.
- To list SELs by using an in-band network, use the following command:
ipmitool sel elist
- To list SELs remotely over the LAN, use the following command:
ipmitool -I lanplus -U <username> -P <password> -H <BMC IP addres or BMC hostname> sel elist
- To list SELs by using an in-band network, use the following command:
- Scan the SELs for an event with the value OEM record de. Did you find a SEL
with the value OEM record de?
If Then Yes: Continue with the next step. No Go to step 4. - The OEM record de specific log information is indicated by the rightmost digits of the SEL with
the value OEM record de. Use Table 1 to determine the service action
to perform. This ends the procedure.
Table 1. OEM record de specific log information and service action OEM record de specific log information Service action 00xxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support. 01xxxxxxxxxx Go to the EPUB_PRC_FIND_DECONFIGURE_PART isolation procedure. 04xxxxxxxxxx Go to the EPUB_PRC_SP_CODE isolation procedure. 05xxxxxxxxxx Go to the EPUB_PRC_PHYP_CODE isolation procedure. 08xxxxxxxxxx Go to the EPUB_PRC_ALL_PROCS isolation procedure. 09xxxxxxxxxx Go to the EPUB_PRC_ALL_MEMCRDS isolation procedure. 0Axxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support. 10xxxxxxxxxx Go to the EPUB_PRC_LVL_SUPPORT isolation procedure. 16xxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support. 1Cxxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support. 22xxxxxxxxxx Go to the EPUB_PRC_MEMORY_PLUGGING_ERROR isolation procedure. 2Dxxxxxxxxxx Go to the EPUB_PRC_FSI_PATH isolation procedure. 30xxxxxxxxxx Go to the EPUB_PRC_PROC_AB_BUS isolation procedure. 31xxxxxxxxxx Go to the EPUB_PRC_PROC_XYZ_BUS isolation procedure. 34xxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support. 37xxxxxxxxxx Go to the EPUB_PRC_EIBUS_ERROR isolation procedure. 3Fxxxxxxxxxx Go to the EPUB_PRC_POWER_ERROR isolation procedure. 4Dxxxxxxxxxx Go to Getting fixes and update the system firmware to the most recent level of firmware that is available. If this SEL event continues to be logged, go to Collecting diagnostic data. Then, go to Contacting IBM service and support. 4Fxxxxxxxxxx Go to the EPUB_PRC_MEMORY_UE isolation procedure. 55xxxxxxxxxx Go to the EPUB_PRC_HB_CODE isolation procedure. 56xxxxxxxxxx Go to the EPUB_PRC_TOD_CLOCK_ERR isolation procedure. 5Cxxxxxxxxxx Go to the EPUB_PRC_COOLING_SYSTEM_ERR isolation procedure. 5Exxxxxxxxxx Go to the EPUB_PRC_GPU_ISOLATION_PROCEDURE isolation procedure. - Scan the SELs for an event with the value OEM record df. Did you find a SEL
with the value OEM record df?
If Then Yes: Continue with the next step. No Go to step 10. - One or more events might be logged around the same time as the event with the value OEM
record df. These events require a service action if they meet the following criteria:
- A service action keyword is present. For a list of service action keywords, see Identifying service action keywords in system event logs.
- Asserted is in the description.
- OEM record is not in the description.
- The event has a time stamp in close proximity to the time stamp of the event with the value OEM record df.
- Did you find any SEL events that require a service action as defined in step 5?
If Then Yes: Continue with the next step. No: Go to Collecting diagnostic data. Then, go to Contacting IBM service and support. - Did you find only one SEL event that requires a service action as defined in step 5?
If Then Yes: Continue with the next step. No: Go to step 9. - Record the SEL record ID for the event you identified in step 5. The SEL record ID is
indicated by the leftmost digits of the SEL. Use the ipmitool command to
display the SEL details.
- To display SEL details by using an in-band network, use the following
command:
ipmitool sel get <SEL record ID>
Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a. - To display SEL details remotely over the LAN, use the following command:
ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname> sel get <SEL record ID>
Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
The sensor ID field contains sensor information in the format sensor name (sensor ID). Record the sensor name, sensor ID, and event description. Then, use the following information to determine the service action to perform:- If your system is an 8335-GCA or 8335-GTA, go to Identifying a service action by using sensor and event information for the 8335-GCA and 8335-GTA.
- If your system is an 8335-GTB, go to Identifying a service action by using sensor and event information for the 8335-GTB.
- If your system is an 8348-21C, go to Identifying a service action by using sensor and event information for the 8348-21C.
This ends the procedure.
- To display SEL details by using an in-band network, use the following
command:
- You identified more than one event in step 5. The service actions for all
of the events that were identified in step 5 must be performed to
successfully complete the repair. Record the SEL record IDs for the events that you identified in
step 5. The SEL record
ID is indicated by the leftmost digits of the SEL. Use the ipmitool command
to display SEL details for each SEL record ID that you recorded.
- To display SEL details by using an in-band network, use the following
command:
ipmitool sel get <SEL record ID>
Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a. - To display SEL details remotely over the LAN, use the following command:
ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname> sel get <SEL record ID>
Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
The sensor ID field contains sensor information in the format sensor name (sensor ID). Record the sensor name, sensor ID, and event description. Then, use this information to determine the service action to perform:- If your system is an 8335-GCA or 8335-GTA, go to Identifying a service action by using sensor and event information for the 8335-GCA and 8335-GTA.
- If your system is an 8335-GTB, go to Identifying a service action by using sensor and event information for the 8335-GTB.
- If your system is an 8348-21C, go to Identifying a service action by using sensor and event information for the 8348-21C.
This ends the procedure.
- To display SEL details by using an in-band network, use the following
command:
- Scan the SEL for an event with the value OEM record c0.
- Did you find an event with the value OEM record c0?
If Then Yes: Continue with the next step. No: Go to step 13. - The OEM record c0 specific log information is indicated by the rightmost digits of the SEL with
the value OEM record c0. If your system is an 8335-GCA or 8335-GTA, use Table 2 to determine the service action
to perform. If your system is an 8335-GTB, use Table 3 to determine the service
action to perform. If your system is an 8348-21C, use Table 4 to determine the service action
to perform.
Table 2. OEM record c0 specific log information, description, and service action for an 8335-GCA or 8335-GTA OEM record c0 specific log information Description Service action 320a01xxxxxx Phy read failure If you are viewing this event from the BMC, the missing or defective cable is now operational and no service action is required. Otherwise, replace the missing or failed LAN cable that attaches the console to the system. 320a02xxxxxx Phy speed and duplex failure 320exxxxxxxx OCC reset required This event is for information only. No service action is required. 3a0400xxxxxx Chassis soft power off A user initiated power off request occurred. No service action is required. 3a0402xxxxxx Chassis soft reboot 3a0701xxxxxx Request for PNOR access This event is for information only. No service action is required. 3a0702xxxxxx Release of PNOR access 3a1100xxxxxx Fan thread stopped 3a1101xxxxxx Fan thread started 3a1503xxxxxx Primary side boot failed Go to Resolving a system firmware boot failure. 3a1504xxxxxx Golden side boot failed Go to Resolving a system firmware boot failure. 3a1601xxxxxx Fan 1 failure Replace Fan 1. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure. 3a1602xxxxxx Fan 2 failure Replace Fan 2. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure. 3a1603xxxxxx Fan 3 failure Replace Fan 3. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure. 3a1604xxxxxx Fan 4 failure Replace Fan 4. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure. 3a260xyyyyyy, where x = 1, 2, or 3 System shut down due to one or more missing or failed fans The OEM record c0 specific log information is 3a260xyyyyyy, where x is the number of fans that were missing or failed when the system was shut down. The system cannot be powered on with missing fans. If any SEL events were logged with OEM record c0 specific log information 3a16xxxxxxxx, complete the service action indicated in this table. Otherwise, replace the fans, one at a time, until the problem is resolved. Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure. 3a2604yyyyyy All of the fans are missing or failed Ensure that the fan power cable and the disk and fan signal cable are seated properly. If the problem persists, replace the following items, one at a time, until the problem is resolved: Note: Go to 8335-GCA and 8335-GTA locations to identify the physical location and removal and replacement procedure.- Power riser with time-of-day battery slot
- Fan power cable
- Disk and fan signal cable
- Disk drive and fan card
Table 3. OEM record c0 specific log information, description, and service action for an 8335-GTB OEM record c0 specific log information Description Service action 320a01xxxxxx Phy read failure If you are viewing this event from the BMC, the missing or defective cable is now operational and no service action is required. Otherwise, replace the missing or failed LAN cable that attaches the console to the system. 320a02xxxxxx Phy speed and duplex failure 320exxxxxxxx OCC reset required This event is for information only. No service action is required. 3a0400xxxxxx Chassis soft power off A user initiated power off request occurred. No service action is required. 3a0402xxxxxx Chassis soft reboot 3a0701xxxxxx Request for PNOR access This event is for information only. No service action is required. 3a0702xxxxxx Release of PNOR access 3a1100xxxxxx Fan thread stopped 3a1101xxxxxx Fan thread started 3a1503xxxxxx Primary side boot failed Go to Resolving a system firmware boot failure. 3a1504xxxxxx Golden side boot failed Go to Resolving a system firmware boot failure. 3a1601xxxxxx Fan 1 failure Replace Fan 1. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure. 3a1602xxxxxx Fan 2 failure Replace Fan 2. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure. 3a1603xxxxxx Fan 3 failure Replace Fan 3. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure. 3a1604xxxxxx Fan 4 failure Replace Fan 4. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure. 3a2600xxxxxx The water-cooled system shut down due to too many processor core sensors reading a temperature at or above the maximum temperature that is allowed. At least one processor is over heating. Go to Resolving an over temperature problem for a water-cooled 8335-GTB system. 3a260xyyyyyy, where x = 1, 2, or 3 System shut down due to one or more missing or failed fan The OEM record c0 specific log information is 3a260xyyyyyy where x is the number of fans that were missing or failed when the system was shut down. The system cannot be powered on with missing fans. If any SEL events were logged with OEM record c0 specific log information 3a16xxxxxxxx, complete the service action indicated in this table. Otherwise, replace the fans, one at a time, until the problem is resolved. Go to 8335-GTB locations to identify the physical location and removal and replacement procedure. 3a2604yyyyyy All of the fans are missing or failed Ensure that the fan power cable and the disk and fan signal cable are seated properly. If the problem persists, replace the following items, one at a time, until the problem is resolved: Note: Go to 8335-GTB locations to identify the physical location and removal and replacement procedure.- Power riser with time-of-day battery slot
- Fan power cable
- Disk and fan signal cable
- Disk drive and fan card
Table 4. OEM record c0 specific log information, description, and service action for an 8348-21C OEM record c0 specific log information Description Service action 320a01xxxxxx Phy read failure If you are viewing this event from the BMC, the missing or defective cable is now operational and no service action is required. Otherwise, replace the missing or failed LAN cable that attaches the console to the system. 320a02xxxxxx Phy speed and duplex failure 320exxxxxxxx OCC reset required This event is for information only. No service action is required. 3a0400xxxxxx Chassis soft power off A user initiated power off request occurred. No service action is required. 3a0402xxxxxx Chassis soft reboot 3a0701xxxxxx Request for PNOR access This event is for information only. No service action is required. 3a0702xxxxxx Release of PNOR access 3a1100xxxxxx Fan thread stopped 3a1101xxxxxx Fan thread started 3a1503xxxxxx Primary side boot failed Go to Resolving a system firmware boot failure. 3a1504xxxxxx Golden side boot failed Go to Resolving a system firmware boot failure. 3a1601xxxxxx Fan 1 failure Replace Fan 1. Go to 8348-21C locations to identify the physical location and removal and replacement procedure. 3a1602xxxxxx Fan 2 failure Replace Fan 2. Go to 8348-21C locations to identify the physical location and removal and replacement procedure. 3a1603xxxxxx Fan 3 failure Replace Fan 3. Go to 8348-21C locations to identify the physical location and removal and replacement procedure. 3a1604xxxxxx Fan 4 failure Replace Fan 4. Go to 8348-21C locations to identify the physical location and removal and replacement procedure. 3a1605xxxxxx Fan 5 failure Replace Fan 5. Go to 8348-21C locations to identify the physical location and removal and replacement procedure. 3a260xyyyyyy, where x = 1, 2, 3, or 4 System shut down due to one or more missing or failed fans The OEM record c0 specific log information is 3a260xyyyyyy, where x is the number of fans that were missing or failed when the system was shut down. The system cannot be powered on with missing or failed fans. If any SEL events were logged with OEM record c0 specific log information 3a16xxxxxxxx, complete the service action indicated in this table. Otherwise, replace the fans, one at a time, until the problem is resolved. Go to 8348-21C locations to identify the physical location and removal and replacement procedure. 3a2605yyyyyy All of the fans are missing or failed Replace the disk drive backplane. Go to 8348-21C locations to identify the physical location and removal and replacement procedure. - One or more SEL events might require a service action. These events require a service action if
they meet the following criteria:
- A service action keyword is present. For a list of service action keywords, see Identifying service action keywords in system event logs.
- Asserted is in the description.
- OEM record is not in the description.
- Did you find one or more SEL events that require a service action as defined in step 13?
If Then Yes: Continue with the next step. No: This ends the procedure. - The service actions for all of the events that were identified in step 13 must be performed to
successfully complete the repair. Record the SEL record IDs for the events that you identified in
step 13. The SEL record ID is
indicated by the leftmost digits of the SEL. Use the ipmitool command to display SEL details for
each SEL record ID that you recorded.
- To display SEL details by using an in-band network, use the following
command:
ipmitool sel get <SEL record ID>
Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a. - To display SEL details remotely over the LAN, use the following command:
ipmitool -I lanplus -U <username> -P <password> -H <BMC IP address or BMC hostname> sel get <SEL record ID>
Note: The SEL record ID must be entered in hexadecimal format. For example: 0x1a.
The sensor ID field contains sensor information in the format sensor name (sensor ID). Record the sensor name, sensor ID, and event description. Then, use this information to determine the service action to perform:- If your system is an 8335-GCA or 8335-GTA, go to Identifying a service action by using sensor and event information for the 8335-GCA and 8335-GTA.
- If your system is an 8335-GTB, go to Identifying a service action by using sensor and event information for the 8335-GTB.
- If your system is an 8348-21C, go to Identifying a service action by using sensor and event information for the 8348-21C.
This ends the procedure.
- To display SEL details by using an in-band network, use the following
command:
Parent topic: Identifying a service action