A disk unit seems to have stopped communicating with the system.
The system has stopped normal operation until the cause of the
disk unit failure is found and corrected. Ensure you have read the Danger notices
in Licensed internal code (LIC) isolation procedures before continuing with this procedure.
If
the disk unit that stopped communicating with the system has
mirrored
protection active, normal operation of the system stops for one
to two minutes. Then the system suspends mirrored protection for that disk
unit and continues normal operation. See
Disk unit recovery procedures for more information on
systems with mirrored protection.
Note: Do not power off the system or partition
using the white button, function 08, ASMI, or HMC immediate power-off when
performing this procedure. If this procedure or other isolation procedures
referenced by this procedure direct you to IPL or power off the system,
- perform a partition main storage dump (see Performing a platform or main storage dump),
or
- if additional dump information is not needed, perform a function 03 IPL
or restart the system or partition using the HMC.
- If the system has logical partitions, perform this procedure from
the logical partition that reported the problem. To determine if the system
has logical partitions, go to Determining if the system has logical partitions before
continuing with this procedure.
- Was a problem summary form completed for this problem?
- No: Continue with the next step.
- Yes: Use the problem summary form information
and go to step 4.
- Fill out a problem
reporting form completely with the instructions provided.
- Recovery from a device command time-out may have
caused the communications loss condition (indicated by an SRC on the control
panel or in the HMC). This communications loss condition has the
following symptoms:
- The A6xx SRC does not increment within two minutes.
- The system continues to run normally after it recovers from the communications
loss condition and the reference code is cleared from the control panel.
Does the communication loss condition have the above symptoms?
- Yes: Continue with the next step.
- No: Go to step 6.
- Verify that all Licensed Internal Code PTFs have been applied to
the system. Apply any Licensed Internal Code PTFs that have not
been applied to the system. Does the intermittent condition continue?
- Yes: Print all product activity logs. Print the
LIC logs with a major code of 1000. Provide this information to your next
level of support. This ends the procedure.
- No: This ends the procedure.
- A manual reset of the IOP may clear the attention
reference code. Perform the following:
If you are working
from the control panel:
- Select Manual mode on the control panel.
- Select Function 25 and press Enter.
- Select Function 26 and press Enter.
- Select Function 67 and press Enter to
reset the IOP.
- Wait 10 minutes.
- Select Function 25 and press Enter to
disable the service functions on the control panel.
If you are working from the HMC:
- In the Navigation Area, open the Service Applications folder.
- Select Service Focal Point.
- In the contents area, select Service Utilities.
- In the Service Utilities window, select the system you are working
on.
- Select .
- Select the logical partition, and then select Partition
Functions.
- Select Disk Unit IOP Reset/Reload (67).
- Wait 10 minutes.
Did the reset successfully clear the control panel SRC or HMC panel
value and can commands be entered on the partition console?
- No: Continue with the next step.
- Yes: Look for a Service Action Log (SAL) entry
since the last IPL, and use it to fix the problem (see Using the Service Action Log).
If a B6xx 5090 SRC occurred since the last IPL, look for other SRC entries
and take action on them first. This ends the procedure.
- Is the SRC the same reference code that sent you here?
- Yes: The same reference code occurred. Continue
with the next step.
- No: Collect all words of the reference code and
go to Reference
codes to resolve the new problem. This ends the procedure.
- Powering off and powering on the affected IOP domain may clear
the attention reference code. Perform the following:
If you
are working from the control panel:
- Select Manual mode on the control panel.
- Select Function 25 and press Enter.
- Select Function 26 and press Enter.
- Select Function 68 and press Enter to
power off the domain.
- After the domain has been powered off or 10 minutes have passed,
select Function 69 and press Enter to power on the
domain.
- Wait 10 minutes.
- Select Function 25 and press Enter to
disable the service functions on the control panel.
If you are working from the HMC:
- In the Navigation Area, open the Service Applications folder.
- Select Service Focal Point.
- In the contents area, select Service Utilities.
- In the Service Utilities window, select the system you are working
on.
- Select .
- Select the logical partition, and then select Partition
Functions.
- Select Power off domain (68).
- After the domain has been powered off or 10 minutes have passed,
select Power on domain (69).
- Wait 10 minutes.
Did this successfully clear the control panel SRC or HMC panel value,
and can commands be entered on the partition console?
- No: Continue with the next step.
- Yes: Look for a SAL entry since the last IPL,
and use it to fix the problem (see Using the Service Action Log).
If a B6xx 5090 SRC occurred since the last IPL, look for other SRC entries
and take action on them first. This ends the procedure.
- Is the SRC the same reference code that sent you here?
- Yes: The same reference code occurred. Continue
with the next step.
- No: Collect all words of the reference code and
go to Reference
codes to resolve the new problem. This ends the procedure.
- Is the disk unit that reported this problem a virtual disk unit?
- Yes: Continue with the next step.
- No: Continue with step 12.
- Using the HMC, check the status of the I/O hosting partition.
Does the partition have a status of "running"?
- Yes: Check for configuration problems and resolve them. If there
are no configuration problems, then continue with step 10.
- No: Fix any problems found in the I/O hosting partition. If
that does not resolve the problem that sent you here, then continue with step 12.
- Perform a main storage dump, then perform an IPL
by performing the following:
If you are working from the
control panel:
- Select Manual mode on the control panel.
- Select Function 22 and press Enter to
dump the main storage to the load-source disk unit.
- Wait for SRC A100 300x to occur, indicating that the dump is
complete.
- Then perform an IPL to DST (see Performing an IPL to DST).
If you are working from the HMC:
- In the Navigation Area, open Server and Partition.
- Select Server Management.
- In the contents area, open the server on which the logical partition
is located.
- Select Partitions.
- Right-click the logical partition profile and select Restart
Partition.
- In the Restart Partition window, select the Dump restart
option.
Does a different SRC occur, or does a display appear
on the console showing reference codes?
- No: Continue with the next step.
- Yes: Go to Reference codes to service the new problem. This
ends the procedure.
- Does the same reference code occur?
- Yes: Continue with the next step.
- No: The problem is intermittent. Perform the following:
- Print the system product activity log for the magnetic storage subsystem
and print the LIC logs with a major code of 1000.
- Copy the main storage dump to removable media (see Copying a current
main storage dump).
- Contact your next level of support and provide them with this information. This
ends the procedure.
- Are characters 7-8 of the top 16 character line of function 12
(2 rightmost characters of word 2) equal to 13 or 17?
- Yes: Continue with the next step.
- No: Go to step 17.
- Use the word 1 through 9 information recorded on
the Problem summary form to determine the disk unit that stopped communicating
with the system:
- Is the disk unit reference code 0000?
- No: Using the information from step 15,
find the table for the indicated disk unit type in the Reference codes topic.
Perform problem analysis for the disk unit reference code. This
ends the procedure.
- Yes: Perform the following steps:
- Determine the IOP type by using characters 9-12 of the bottom 16 character
line of function 13 (4 leftmost characters of word 9).
- Find the unit reference code table for the IOP type in the Reference codes topic.
Determine the unit reference code by using characters 13-16 of the bottom
16 character line of function 13 (4 rightmost characters of word 9).
- Perform problem analysis for the unit reference code. This
ends the procedure.
- Are characters 7-8 of the top 16 character line
of function 12 (the two rightmost characters of word 2) equal to 27?
- Yes: Continue with the next step.
- No: Go to step 21.
- Use the word 1 through 9 information recorded on
the Problem summary form to determine the disk unit that stopped communicating
with the system:
- Is the disk unit reference code 0000?
- Are characters 9-16 of the bottom 16 character line of function
13 (word 9) B6xx 51xx?
- Yes: Use the B6xx table in the Reference codes topic.
Perform problem analysis for the 51xx unit reference code. This
ends the procedure.
- No: Using the information from step 18,
find the table for the indicated disk unit type in the Reference codes topic.
Perform problem analysis for the disk unit reference code. This
ends the procedure.
- Are the 2 rightmost characters of word 2 on the
Problem summary form equal to 62?
- No: Use the information in characters 9-16 of
the bottom 16 character line of function 13 (word 9) and go to the Reference codes topic.
Use this information instead of the information in word 1 for the reference
code. This ends the procedure.
- Yes: Continue with the next step.
- Are characters 9-16 of the top 16 character line of function 12
(word 3) equal to 00010004?
- Yes: Continue with the next step.
- No: Go to step 25.
- Are characters 13-16 of the bottom 16 character line of function
12 (4 rightmost characters of word 5) equal to 0000?
- No: Continue with the next step.
- Yes: Go to step 26.
- Note the following:
- Characters 13-16 of the bottom 16 character line of function 12 (4 rightmost
characters of word 5) contain the disk unit reference code.
- Characters 1-8 of the top 16 character line of function 13 (word 6) contains
the disk unit address.
- Characters 9-16 of the top 16 character line of function 13 (word 7) contain
the IOP direct select address.
- Characters 1-8 of the bottom 16 character line of function 13 (word 8)
contains the disk unit type, level and model number.
Find the table for the disk unit type (characters 1-4 of the bottom
16 character line of function 13 - 4 leftmost characters of word 8) in the Reference
codes topic, and use characters 13-16 of the bottom 16 character line
of function 12 (4 rightmost characters of word 5) as the unit reference code. This
ends the procedure.
- Are characters 9-16 of the top 16 character line
of function 12 (word 3) equal to 0002000D?
- Yes: Continue with the next step.
- No: Use the information in characters 9-16 of the
bottom 16 character line of function 13 (word 9), instead of the information
in word 1 for the reference code, and go to the Reference codes topic.
- Characters 1-8 of the top 16 character line of function 13 (word 6) may
contain the disk unit address.
- Characters 9-16 of the top 16 character line of function 13 (word 7) may
contain the IOP direct select address.
- Characters 1-8 of the bottom 16 character line of function 13 (word 8)
may contain the disk unit type, level and model number. This ends
the procedure.
- Note the following:
- Characters 1-8 of the top 16 character line of function 13 (word 6) contains
the disk unit address.
- Characters 9-16 of the top 16 character line of function 13 (word 7) contain
the IOP direct select address.
- Characters 1-8 of the bottom 16 character line of function 13 (word 8)
contains the disk unit type, level and model number.
Find the table for the disk unit type (characters 1-4 of the bottom
16 character line of function 13 (4 leftmost characters of word 8) in the Reference
codes topic and use 3002 as the unit reference code. Exchange the FRUs
for URC 3002 one at a time. This ends the procedure.