A disk unit seems to have stopped communicating with the
system.
The system has stopped normal operation until the cause
of the disk unit failure is found and corrected. Ensure you have read
the Danger notices in Licensed Internal Code isolation procedures before continuing with this procedure.
If
the disk unit that stopped communicating with the system has
mirrored
protection active, normal operation of the system stops
for one to two minutes. Then the system suspends mirrored protection
for that disk unit and continues normal operation.
Note: Do not power
off the system or partition using the white button, function 08, ASMI,
or management console immediate power-off when performing this procedure.
If this procedure or other isolation procedures referenced by this
procedure direct you to IPL or power off the system,
- perform a partition main storage dump (see Performing dumps ),
or
- if additional dump information is not needed, perform a function
03 IPL or restart the system or partition using the management console.
- If the system has logical partitions, perform this procedure
from the logical partition that reported the problem. To determine
if the system has logical partitions, go to Determining if the system has
logical partitions before continuing with this procedure.
- Was a problem summary form completed for this problem?
- No: Continue with the next step.
- Yes: Use the problem summary form information
and go to step 4.
- Fill out a Problem Reporting Form completely with the instructions
provided.
- Recovery from a device command time-out
may have caused the communications loss condition (indicated by an
SRC on the control panel or in the management console). This
communications loss condition has the following symptoms:
- The A6xx SRC does not increment within two minutes.
- The system continues to run normally after it recovers from the
communications loss condition and the reference code is cleared from
the control panel.
Does the communication loss condition have the above symptoms?
- Yes: Continue with the next step.
- No: Go to step 6.
- Verify that all Licensed Internal Code PTFs have been applied
to the system. Apply any Licensed Internal Code PTFs that
have not been applied to the system. Does the intermittent condition
continue?
- Yes: Print all product activity logs.
Print the LIC logs with a major code of 1000. Provide this information
to your next level of support. This ends the procedure.
- No: This ends the procedure.
- Is the storage hosted by another partition?
- Yes: Contact your next level of support.
- No: Continue with the next step.
- A manual reset of the IOP may clear the
attention reference code. Perform the following steps:
If
you are working from the control panel:
- Select Manual mode on the control panel.
- Select Function 25 and press
Enter.
- Select Function 26 and press
Enter.
- Select Function 67 and press
Enter to reset the IOP.
- Wait 10 minutes.
- Select Function 25 and press
Enter to disable the service functions on the control panel.
If you are working from the HMC:
- In the navigation area, select Systems Management.
- In the contents area, open the server on which the logical
partition is located.
- In the contents are, select the logical partition.
- Select .
- Select (67) Disk Unit IOP Reset/Reload.
- Wait 10 minutes.
Did the reset successfully clear the control panel SRC or
management console panel value and can commands be entered on the
partition console?
- No: Continue with the next step.
- Yes: Look for a Service Action Log (SAL)
entry since the last IPL, and use it to fix the problem (see Searching the service action
log). If a B6xx 5090 SRC occurred since the last IPL,
look for other SRC entries and take action on them first. This
ends the procedure.
- Is the SRC the same reference code that sent you here?
- Yes: The same reference code occurred.
Continue with the next step.
- No: Collect all words of the reference
code and perform, problem analysis to resolve the new problem. This
ends the procedure.
- Powering off and powering on the affected IOP domain may
clear the attention reference code. Perform the following steps:
If you are working from the control panel:
- Select Manual mode on the control panel.
- Select Function 25 and press
Enter.
- Select Function 26 and press
Enter.
- Select Function 68 and press
Enter to power off the domain.
- After the domain has been powered off or 10 minutes
have passed, select Function 69 and press Enter
to power on the domain.
- Wait 10 minutes.
- Select Function 25 and press
Enter to disable the service functions on the control panel.
If you are working from the HMC:
- In the navigation area, select Systems Management.
- In the contents area, open the server on which the logical
partition is located.
- In the contents are, select the logical partition.
- Select .
- Select (68) Concurrent Maintenance Power
Off Domain.
- After the domain has been powered off or 10 minutes
have passed, select (69) Concurrent Maintenance Power On
Domain.
- Wait 10 minutes.
Did this successfully clear the control panel SRC or management
console panel value, and can commands be entered on the partition
console?
- No: Continue with the next step.
- Yes: Look for a SAL entry since the last
IPL, and use it to fix the problem (see Searching the service action log ). If a B6xx 5090 SRC occurred since the last IPL,
look for other SRC entries and take action on them first. This
ends the procedure.
- Is the SRC the same reference code that sent you here?
- Yes: The same reference code occurred.
Continue with the next step.
- No: Collect all words of the reference
code and perform problem analysis to resolve the new problem. This
ends the procedure.
- Perform a main storage dump, then perform an IPL by performing
the following:
If you are working from the
control panel:
- Select Manual mode on the control panel.
- Select Function 22 and press
Enter to dump the main storage to the load-source disk unit.
- Wait for SRC A100 300x to occur, indicating that the
dump is complete.
- Then perform an IPL to DST (see Performing an IPL to dedicated service tools).
If you are working from the HMC:
- In the navigation area, select Systems Management.
- In the contents area, open the server on which the logical
partition is located.
- In the contents are, select the logical partition.
- Select Operations > Restart.
- In the Restart Partition window, select the Dump restart
option.
Does a different SRC occur, or does a display
appear on the console showing reference codes?
- No: Continue with the next step.
- Yes: Perform problem analysis to correct
the new problem. This ends the procedure.
- Does the same reference code occur?
- Yes: Continue with the next step.
- No: The problem is intermittent. Perform
the following:
- Print the system product activity log for the magnetic storage
subsystem and print the LIC logs with a major code of 1000.
- Copy the main storage dump to removable media (see Managing dumps).
- Contact your next level of support and provide them with this
information. This ends the procedure.
- Are characters 7-8 of the top 16 character line of function
12 (2 rightmost characters of word 2) equal to 13 or 17?
- Yes: Continue with the next step.
- No: Go to step 16.
- Use the word 1 through 9 information recorded
on the Problem summary form to determine the disk unit that stopped
communicating with the system:
- Is the disk unit reference code 0000?
- No: Using the information from step 14, find the table for the indicated
disk unit type. Perform problem analysis for the disk unit reference
code. This ends the procedure.
- Yes: Perform the following steps:
- Determine the IOP type by using characters 9-12 of the bottom
16 character line of function 13 (4 leftmost characters of word 9).
- Find the unit reference code table for the IOP type. Determine
the unit reference code by using characters 13-16 of the bottom 16
character line of function 13 (4 rightmost characters of word 9).
- Perform problem analysis for the unit reference code. This
ends the procedure.
- Are characters 7-8 of the top 16 character
line of function 12 (the two rightmost characters of word 2) equal
to 27?
- Yes: Continue with the next step.
- No: Go to step 20.
- Use the word 1 through 9 information recorded
on the Problem summary form to determine the disk unit that stopped
communicating with the system:
- Is the disk unit reference code 0000?
- Are characters 9-16 of the bottom 16 character line of
function 13 (word 9) B6xx 51xx?
- Yes: Using the B6xx table, perform problem
analysis for the 51xx unit reference code. This ends the
procedure.
- No: Using the information from step 17, find the table for the indicated
disk unit type. Perform problem analysis for the disk unit reference
code. This ends the procedure.
- Are the 2 rightmost characters of word 2
on the Problem summary form equal to 62?
- No: Use the information in characters
9-16 of the bottom 16 character line of function 13 (word 9) and use
this information instead of the information in word 1 for the reference
code. This ends the procedure.
- Yes: Continue with the next step.
- Are characters 9-16 of the top 16 character line of function
12 (word 3) equal to 00010004?
- Yes: Continue with the next step.
- No: Go to step 24.
- Are characters 13-16 of the bottom 16 character line of
function 12 (4 rightmost characters of word 5) equal to 0000?
- No: Continue with the next step.
- Yes: Go to step 25.
- Note the following:
- Characters 13-16 of the bottom 16 character line of function 12
(4 rightmost characters of word 5) contain the disk unit reference
code.
- Characters 1-8 of the top 16 character line of function 13 (word
6) contains the disk unit address.
- Characters 9-16 of the top 16 character line of function 13 (word
7) contain the IOP direct select address.
- Characters 1-8 of the bottom 16 character line of function 13
(word 8) contains the disk unit type, level and model number.
Find the table for the disk unit type (characters 1-4 of the
bottom 16 character line of function 13 - 4 leftmost characters of
word 8), and use characters 13-16 of the bottom 16 character line
of function 12 (4 rightmost characters of word 5) as the unit reference
code. This ends the procedure.
- Are characters 9-16 of the top 16 character
line of function 12 (word 3) equal to 0002000D?
- Yes: Continue with the next step.
- No: Use the information in characters 9-16
of the bottom 16 character line of function 13 (word 9), instead of
the information in word 1 for the reference code, and perform problem
analysis.
- Characters 1-8 of the top 16 character line of function 13 (word
6) may contain the disk unit address.
- Characters 9-16 of the top 16 character line of function 13 (word
7) may contain the IOP direct select address.
- Characters 1-8 of the bottom 16 character line of function 13
(word 8) may contain the disk unit type, level and model number. This
ends the procedure.
- Note the following:
- Characters 1-8 of the top 16 character line of function 13 (word
6) contains the disk unit address.
- Characters 9-16 of the top 16 character line of function 13 (word
7) contain the IOP direct select address.
- Characters 1-8 of the bottom 16 character line of function 13
(word 8) contains the disk unit type, level and model number.
Find the table for the disk unit type (characters 1-4 of the
bottom 16 character line of function 13 (4 leftmost characters of
word 8) and use 3002 as the unit reference code. Exchange the FRUs
for URC 3002 one at a time. This ends the procedure.