APAR status
Closed as program error.
Error description
Error Description: Ibm Spectrum Protect Plus replication fails with "TRANSFER FAILED: [ERRNO 32] BROKEN PIPE" due to a network time-out This defect is similar to Apar IT29759 except that replication job was not cancelled. In the replication job log you will see errors similar to the following: [2019-10-16 17:20:24,165] ERROR pid-27501 vsnap.repld Session 41: worker failed: Transfer failed: Disconnected from partner ssrebrvsnp901: [Errno 32] Broken pipe [2019-10-16 17:20:24,175] INFO pid-27501 vsnap.replication.session Session 41: status = FAILED [2019-10-16 17:20:24,180] INFO pid-27501 vsnap.replication.config Relationship fb256ebdbfa36c3d68b2ee6672861711: last sync status = FAILED On the problem Vsnap system, you may see "zfs recv" processes running that did not get cleaned up due to a network time-out. Spectrum Protect Plus versions Affected: 10.1.4 Customer/L2 Diagnostics (If Applicable) Initial Impact: HIGH Additional Keywords: zfs recv time-out Sppsup-1168 timeout TS002806393
Local fix
Ensure that no replication jobs are running. Run "sudo pkill -f 'zfs recv'" on both source and target replication vSnaps
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.4. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.4.277 and 10.1.5. Note that this is subject to change * * at the discretion of IBM. * ****************************************************************
Problem conclusion
When a vSnap replication session failed, under certain conditions the data transfer pipe was not gracefully closed. This resulted in some hung processes being left behind on the replication target vSnap. During subsequent replication attempts, this would result in a "broken pipe" error. This problem was previously seen in APAR IT29759. Under that APAR, some fixes were previously made to ensure that the transfer pipes were closed gracefully. But the fixes were incomplete because they only addressed replication cancellation plus certain failure conditions. There are other failure conditions, particularly network disconnections, where the previous fixes did not take effect. As a result the "broken pipe" errors were still seen. These remaining problems have been resolved under the current APAR. At the start of each replication session, the primary vSnap now checks the target vSnap and looks for any leftover processes with open pipes that may have been left behind by previous failed sessions. These leftover processes are then automatically terminated before proceeding with the new replication session.
Temporary fix
Comments
APAR Information
APAR number
IT30729
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A14
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-10-29
Closed date
2019-11-13
Last modified date
2019-12-04
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A14","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
30 January 2024