APAR status
Closed as program error.
Error description
During backup, the Windows initiator on the SQL server acts as the iSCSI client/initiator, and the vSnap acts as the iSCSI server/target. If the vSnap server is slow or overloaded, then it cannot handle all the incoming I/O requests quickly enough. This causes the initiator to experience timeouts. The initiator will log a warning/error in the Windows event log, and at the same time the vSnap will log in the /var/log/messages: <timestamp> <vSnapHost> kernel: Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus <WindowsHostIQN>,i,0x400001370000,<vSnapHostIQN>,t,0x01 The initiator immediately aborts and then retries the request. In some cases, the retries keep failing if the vSnap is too overloaded or if the network is slow, in that case the initiator thinks the LUN is disconnected. The Linux RedHat kernel used in the vSnap packages for virtual hosts has a defect which sometimes causes the target to hang when a request is aborted. Here is the reference : Bug 2156588 - Incorrect target abort handling causes iscsi deadlock. https://access.redhat.com/solutions/6992332 Once the target gets into the hung state, all subsequent requests will fail. A reboot is required to clear the hang. IBM Spectrum Protect Plus Versions Affected: IBM Spectrum Protect Plus 10.1.x Additional Keywords: SPP, SPPLUS, TS012054712, iSCSI, timeout, hang
Local fix
Reduce the workload on the vSnap to mitigate the iSCSI timeout occurrences.
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.x * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Apply the fixing level when available. This problem is fixed * * in IBM Spectrum Protect Plus levels 10.1.15.2 and 10.1.16. * * Note that this is subject to change at the discretion of * * IBM. * ****************************************************************
Problem conclusion
Prior to IBM Spectrum Protect Plus 10.1.15.2 and 10.1.16, if a vSnap become overloaded and failed to handle I/O quickly enough, it could cause the iSCSI initiator to hang indefinitely, requiring a restart of the vSnap. Since IBM Spectrum Protect Plus 10.1.15.2 and 10.1.16, the Red Hat kernel has been upgraded to include a fix for this issue, resolving the iSCSI initiator hang.
Temporary fix
Comments
APAR Information
APAR number
IT43284
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A1C
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-03-06
Closed date
2023-09-14
Last modified date
2023-09-14
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
vSnap iSCSI
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A1C","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
01 February 2024