A fix is available
APAR status
Closed as program error.
Error description
In rare cases, the 'Backup VM' process halts due to an invalid pointer situation. This was observed on SLES Linux. Products affected: IBM Spectrum Protect for Virtual Environments: Data Protection for VMware version 7.1.8 and 8.1 on Linux x86 and Windowsx64 platform IBM Spectrum Protect for Virtual Environments: Data Protection for Microsoft Hyper-V version 7.1.8 and 8.1 on Windowsx64 platform If you are using Data Protection for VMware 8.1 refer to APAR IT26212 If you are using Data Protection for Microsoft Hyper-V 8.1.4-8.1.6, refer to APAR IT26761 If you are using Data Protection for VMware 7.1.8 or Data Protection for Microsoft Hyper-V 7.1.8 or 8.1.0-8.1.2, refer to APAR IT26762 Note 1: The Backup-Archive Client is a prerequisite to using the Data Protection for VMware version 7.1. In Data Protection for VMware environments, the Backup-Archive Client is also known as the data mover. Note 2: The Backup-Archive Client is a prerequisite to using the Data Protection for Microsoft Hyper-V versions 7.1 till 8.1.2. In Data Protection for Microsoft Hyper-V environments, the Backup-Archive Client is also known as the data mover. Customer/L2 Diagnostics A data mover client service,VM trace we can see that the guest 'guest_1' is put into the list of failed VMs but at that point it is not yet handled. <timestamp> [PID] [TID_1] : vmOverlappedIO.cpp (2910): OverlappedIOMonitor::KillVM(): error happened on consumer thread, abandoning backup for vm 'guest_1' The Consumer Thread [TID_2] was working with vm 'guest_2' which has a failure and therefore is also put in the list which should be handled by OverlappedIOMonitor Thread [TID_1]. <timestamp> [PID] [TID_2] : vmbackvddk.cpp (13360): EXIT <===== vmGetObjInfoDisk(), rc = 0 <timestamp> [PID] [TID_2] : vmbackcommon.cpp (5684): VmVerifyIfSingleDisk(): Found disk: Hard Disk 2 <timestamp> [PID] [TID_2] : vmbackvddk.cpp (16952): VmGetDiskNumFromLabel: disk num '2' for label 'Hard Disk 2'. <timestamp> [PID] [TID_2] : vmbackcommon.cpp (5698): VmVerifyIfSingleDisk(): Verifying disk backup ctls: checking size on disk vs ctl size coverage: Hard Disk 2. <timestamp> [PID] [TID_2] : vmbackcommon.cpp (5300): VmVerifyIfDiskBackup(): Num of CTLs = 0; type = IFINCR .. : No ctl files found! .. : VM / Disk : guest_2 / Hard Disk 2 .. : capacity : 2147483648 .. : size on disk : 2147483648 .. : ctl coverage size : 0 .. : disk included : Yes .. : prev backup ifincr: Yes .. : ctl matches size : No .. : ctl found : No .. : bitmap found : No .. : disk used : Yes .. : result : FAIL; missing CTLs <timestamp> [PID] [TID_2] : vmbackcommon.cpp (5496): ANS9921E Virtual machine disk, guest_2 (Hard Disk 2), verification check failed (2147483648/0). If the disk is on an NFS datastore and the disk size was recently changed or the disk was moved from non-NFS datastore to NFS datastore, the verification failure is expected and a full backup is required. <timestamp> [PID] [TID_2] : vmbackcommon.cpp (5778): VmVerifyIfSingleDisk(): Exiting with rc 6560. <timestamp> [PID] [TID_2] : vmbackvddk.cpp (8168): ANS9919E Failed to find the expected control files for guest_2. <timestamp> [PID] [TID_2] : vmbackvcm.cpp ( 285): =========> Entering vcmFlushVolumeControlLibrary() <timestamp> [PID] [TID_2] : vmbackvcm.cpp ( 206): =========> Entering vcmLogger::trace() <timestamp> [PID] [TID_2] : vmbackvcm.cpp ( 217): ANS5250E An unexpected error was encountered. <timestamp> [PID] [TID_2] : vmbackvcm.cpp ( 222): <========= Exiting vcmLogger::trace() <timestamp> [PID] [TID_2] : vmbackvcm.cpp ( 295): <========= Exiting vcmFlushVolumeControlLibrary() <timestamp> [PID] [TID_2] : vmbackvddk.cpp (8222): VmSendData(): VmVerifyIfSingleDisk() returned rc=4379 <timestamp> [PID] [TID_2] : vmbackvddk.cpp (9033): VmSendData(): vm guest_2 has 1 totaldisks entries to dispatch. <timestamp> [PID] [TID_2] : vmbackvddk.cpp (9074): VmSendData(): we had an error, telling the IO Monitor to stop backing up this VM. In the last line the Client puts the VM into list of failed VMs. After that, the Consumer Thread [TID_2] destroys the pointer which was used in the message to OverlappedIOMonitor Thread [TID_1]. <timestamp> [PID] [TID_2] : vmbackvddk.cpp (14921): vmBackupVMCleanup(): free vmBackupDataPP However, in the mean time, the OverlappedIOMonitor Thread [TID_1] handled the message "kill vm" and it used the invalid pointer which had already been destroyed by the Consumer Thread [TID_2]: <timestamp> [PID] [TID_1] : vmOverlappedIO.cpp (2910): OverlappedIOMonitor::KillVM(): error happened on consumer thread, abandoning backup for vm 'guest_1' The same pointer was still pointing to vm 'guest_1' at the moment of handling message which causes the processing to halt. Initial Impact: Medium Additional Keywords: TS000989064 halt hang backup
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * Data Protection for VMware version 7.1.8 and 8.1 on Linux * * x86 and Windowsx64 platform * * Data Protection for Microsoft Hyper-V version 7.1.8 and 8.1 * * on Windowsx64 platform * **************************************************************** * PROBLEM DESCRIPTION: * * see ERROR DESCRIPTION * **************************************************************** * RECOMMENDATION: * * This issue is projected to be fixed in the Backup-Archive * * Client version 7.1.8.5 on all Microsoft Windows x64 and * * Linux x86 platforms * * Note that this is subject to change at the discretion of * * IBM. * * * * Note 1: The Backup-Archive Client is a prerequisite to using * * the Data Protection for VMware version 7.1. * * In Data Protection for VMware environments, the * * Backup-Archive Client is also known as the data mover. * * * * Note 2: The Backup-Archive Client is a prerequisite to using * * the Data Protection for Microsoft Hyper-V versions 7.1 till * * 8.1.2. * * In Data Protection for Microsoft Hyper-V environments, the * * Backup-Archive Client is also known as the data mover. * ****************************************************************
Problem conclusion
The code has been changed, so that the Data Mover will not hang.
Temporary fix
Comments
APAR Information
APAR number
IT26762
Reported component name
TSM CLIENT
Reported component ID
5698ISMCL
Reported release
71L
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2018-10-26
Closed date
2018-11-02
Last modified date
2018-11-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
dsmc
Fix information
Fixed component name
TSM CLIENT
Fixed component ID
5698ISMCL
Applicable component levels
R71W PSY
UP
R71L PSY
UP
[{"Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"71L"}]
Document Information
Modified date:
28 September 2021