A fix is available
APAR status
Closed as program error.
Error description
Linux guest hang during I/O error recovery after FICON CHPID reset/recovery. Occurs only on guests defined with OPTION CHPIDV in User Directory. Occurs after FICON CHPID recovery following a cable pull, port block, or some other condition which causes the CHPID to go away and then come back.
Local fix
Drop into CP READ and ReIPL the guest. This will break the Loop/Hang condition.
Problem summary
**************************************************************** * USERS AFFECTED: All z/VM Users of OPTION CHPIDV ONE in User * * Directory. * **************************************************************** * PROBLEM DESCRIPTION: * **************************************************************** * RECOMMENDATION: APPLY PTF * **************************************************************** This symptom is an infinite loop (with many losses of control) that hangs guest virtual machines, particularly Linux, when the guest OS goes into Reset Notification recovery. This base 6.2 SSI problem was found during device qualification testing on test cases that reset a FICON connection (e.g., cable pulls, port blocking, port toggling, etc.). A Reset Notification is term of art used to describe the tap on the shoulder that occurs when a FICON connection becomes available again (after it goes away or is reset). It tells the OS to do rediscovery of the connection to make sure nothing on it has changed. The specific bug is in VM CCW simulation code and causes the sense data associated with the Reset Notification to never be cleared after the guest reads it, specifically when the guest has CHPID Virtualization enabled (OPTION CHPIDV ONE in support of guest relocation). As a result, the guest OS will read the Reset Notification sense, start doing I/O to rediscover the port and run right back into the same sense data again (causing the guest to start recovery over). This happens forever, or until the guest is Re-IPLed (where as a different path finally clears the sense).
Problem conclusion
VM CCW Simulation code in HCPIOV was modified to properly clear the Reset Notification sense data when a guest with OPTION CHPIDV ONE initially reads the sense. I/O dispatching code in HCPIOS was also changed to reflect the Reset Notification error to a guest with CHPIDV ONE in a more timely manner.
Temporary fix
FOR RELEASE VM/ESA CP/ESA R640 : PREREQ: NONE CO-REQ: NONE IF-REQ: NONE FOR RELEASE VM/ESACP/ESAR710 : PREREQ: NONE CO-REQ: NONE IF-REQ: NONE
Comments
APAR Information
APAR number
VM66306
Reported component name
VM CP
Reported component ID
568411202
Reported release
640
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-07-23
Closed date
2019-09-05
Last modified date
2020-12-16
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UM35524 UM35525
Modules/Macros
HCPIOS HCPIOV
Fix information
Fixed component name
VM CP
Fixed component ID
568411202
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG27M","label":"APARs - z\/VM environment"},"Platform":[{"code":"PF054","label":"z\/OS"}],"Version":"640","Line of Business":{"code":"LOB16","label":"Mainframe HW"}}]
Document Information
Modified date:
12 January 2021