Use this procedure to locate defective FRUs not found by normal diagnostics.
Use this procedure to locate defective FRUs not found by normal diagnostics. It should be used when the service processor posts a failure and halts the IPL before server firmware standby is reached.
To perform this procedure, run diagnostics on a minimally configured system. If a failure is detected on the minimally configured system, the remaining FRUs are exchanged one at a time until the failing FRU is identified. If a failure is not detected, FRUs are added back until the failure occurs. The failure is then isolated to the failing FRU.
This memory problem-determination procedure isolates memory subsystem failures. When memory problem isolation is complete, memory cards exhibiting a failure will have been reseated or replaced.
Setting | Description |
---|---|
Monitoring (also called surveillance) | From the ASMI menu, expand the System Configuration menu, then click on Monitoring. Disable both types of surveillance. |
Auto power restart (also called unattended start mode) | From the ASMI menu, expand Power/Restart Control, then click on Auto Power Restart, and set it to disabled. |
Wake on LAN | From the ASMI menu, expand Wake on LAN, and set it to disabled. |
Call Out | From the ASMI menu, expand the Service Aids menu, then click on Call-Home/Call-In Setup. Set the call-home system port and the call-in system port to disabled. |
Record the error code(s) and location codes(s) that sent you to this procedure.
Use the HMC to power off the system.
Examine the amber logic-power LEDs on all of the processor subsystem DCAs.
Are all of the amber logic-power LEDs on all of the processor subsystem DCAs off?
Replace the following memory cards, one at a time, in the order listed, if present.
Turn on the power.
Did the system stop with the same error code as recorded in step 1546-1?
Turn off the power.
Examine the amber logic-power LEDs on all of the processor subsystem DCAs.
Are all of the amber logic-power LEDs on all of the processor subsystem DCAs off?
16W System | 32W System | 48W System | 64W System |
---|---|---|---|
Node 0 MC01 | Node 0 MC01 | Node 0 MC01 | Node 0 MC01 |
Node 0 MC02 | Node 0 MC02 | Node 0 MC02 | Node 0 MC02 |
Node 0 MC03 | Node 1 MC01 | Node 1 MC01 | Node 1 MC01 |
Node 0 MC04 | Node 1 MC02 | Node 1 MC02 | Node 1 MC02 |
Node 0 MC05 | Node 0 MC03 | Node 2 MC01 | Node 2 MC01 |
Node 0 MC06 | Node 0 MC04 | Node 2 MC02 | Node 2 MC02 |
Node 0 MC15 | Node 1 MC03 | Node 0 MC03 | Node 3 MC01 |
Node 0 MC16 | Node 1 MC04 | Node 0 MC04 | Node 3 MC02 |
Node 0 MC08 | Node 0 MC05 | Node 1 MC03 | Node 0 MC03 |
Node 0 MC09 | Node 0 MC06 | Node 1 MC04 | Node 0 MC04 |
Node 0 MC12 | Node 1 MC05 | Node 2 MC03 | Node 1 MC03 |
Node 0 MC13 | Node 1 MC06 | Node 2 MC04 | Node 1 MC04 |
Node 0 MC07 | Node 0 MC15 | Node 0 MC05 | Node 2 MC03 |
Node 0 MC10 | Node 0 MC16 | Node 0 MC06 | Node 2 MC04 |
Node 0 MC11 | Node 1 MC15 | Node 1 MC05 | Node 3 MC03 |
Node 0 MC14 | Node 1 MC16 | Node 1 MC06 | Node 3 MC04 |
Node 0 MC08 | Node 2 MC05 | Node 0 MC05 | |
Node 0 MC09 | Node 2 MC06 | Node 0 MC06 | |
Node 1 MC08 | Node 0 MC15 | Node 1 MC05 | |
Node 1 MC09 | Node 0 MC16 | Node 1 MC06 | |
Node 0 MC12 | Node 1 MC15 | Node 2 MC05 | |
Node 0 MC13 | Node 1 MC16 | Node 2 MC06 | |
Node 1 MC12 | Node 2 MC15 | Node 3 MC05 | |
Node 1 MC13 | Node 2 MC16 | Node 3 MC06 | |
Node 0 MC07 | Node 0 MC08 | Node 0 MC15 | |
Node 0 MC10 | Node 0 MC09 | Node 0 MC16 | |
Node 1 MC07 | Node 1 MC08 | Node 1 MC15 | |
Node 1 MC10 | Node 1 MC09 | Node 1 MC16 | |
Node 0 MC11 | Node 2 MC08 | Node 2 MC15 | |
Node 0 MC14 | Node 2 MC09 | Node 2 MC16 | |
Node 1 MC11 | Node 0 MC12 | Node 3 MC15 | |
Node 1 MC14 | Node 0 MC13 | Node 3 MC16 | |
Node 1 MC12 | Node 0 MC08 | ||
Node 1 MC13 | Node 0 MC09 | ||
Node 2 MC12 | Node 1 MC08 | ||
Node 2 MC13 | Node 1 MC09 | ||
Node 0 MC07 | Node 2 MC08 | ||
Node 0 MC10 | Node 2 MC09 | ||
Node 1 MC07 | Node 3 MC08 | ||
Node 1 MC10 | Node 3 MC09 | ||
Node 2 MC07 | Node 0 MC12 | ||
Node 2 MC10 | Node 0 MC13 | ||
Node 0 MC11 | Node 1 MC12 | ||
Node 0 MC14 | Node 1 MC13 | ||
Node 1 MC11 | Node 2 MC12 | ||
Node 1 MC14 | Node 2 MC13 | ||
Node 2 MC11 | Node 3 MC12 | ||
Node 2 MC14 | Node 3 MC13 | ||
Node 0 MC07 | |||
Node 0 MC10 | |||
Node 1 MC07 | |||
Node 1 MC10 | |||
Node 2 MC07 | |||
Node 2 MC10 | |||
Node 3 MC07 | |||
Node 3 MC10 | |||
Node 0 MC11 | |||
Node 0 MC14 | |||
Node 1 MC11 | |||
Node 1 MC14 | |||
Node 2 MC11 | |||
Node 2 MC14 | |||
Node 3 MC11 | |||
Node 3 MC14 |
Reinstall the nodes into the system.
After the system has been taken down to the minimum memory for the number of nodes in the system, power on the system using the HMC.
Does the system boot to server firmware standby with no error codes on the control panel?
The following steps will isolate the failing node. The nodes will be removed, repopulated with memory, and added back to the system one at a time.
Use the HMC to power down the system.
Remove all nodes from the system.
Reconfigure the memory in the first node according to the first column in the table in PFW1546-11.
Reinstall the first node, then use the HMC to power on the system.
Does the system boot to server firmware standby with no error codes on the control panel?
The failure has been isolated to the node that was just reinstalled. Do the following in the order listed:
Does the system boot to server firmware standby with no error codes on the control panel?
Use the HMC to power down the system.
Is there a second node in the system?
Does the system boot to server firmware standby with no error codes on the control panel?
Use the HMC to power down the system.
Is there a third node in the system?
Does the system boot to server firmware standby with no error codes on the control panel?
Use the HMC to power down the system.
Is there a fourth node in the system?
Does the system boot to server firmware standby with no error codes on the control panel?
The system now boots to server firmware standby with the minimum configuration of memory cards. Go to PFW1546-18 to start adding back in the additional memory in the system.
Power off the system using the HMC.
Add another set of four memory cards back in order, one set a time, according to the table in PFW1546-11.
After each set of four is added, reinstall the node into the system.
Does the system boot to server firmware standby with no error codes on the control panel?
The set of four memory cards that was reinstalled in step 18, or the slots into which they were installed, are causing the failure.
Do the following:
Does the system boot to server firmware standby with no error codes on the control panel?
One or more of the memory cards in the last set installed is bad. Using the known good set, swap out the memory cards one at a time until the failing card is isolated. Replace it. Return the system to its original configuration, then go to MAP 0410: Repair Checkout. This ends the procedure.