Direct links to fixes
7.1.9.000-TIV-TSMCMS-Windows64
7.1.9.000-TIV-TSMCMS-Windows32
7.1.9.000-TIV-TSMCMS-Linuxx64
7.1.9.000-TIV-TSMOC-LinuxS390
7.1.9.000-TIV-TSMOC-LinuxPPC64
7.1.9.000-TIV-TSMOC-Windows
7.1.9.000-TIV-TSMOC-Linuxx64
7.1.9.000-TIV-TSMOC-AIX
7.1.9.000-TIV-TSMSRV-WIN
7.1.9.000-TIV-TSMSRV-SolarisSPARC
7.1.9.000-TIV-TSMSRV-Linuxx86_64
7.1.9.000-TIV-TSMSRV-Linuxs390x
7.1.9.000-TIV-TSMSRV-Linuxppc64
7.1.9.000-TIV-TSMSRV-HP-UX
7.1.9.000-TIV-TSMSRV-AIX
8.1.5.000-IBM-SPCMS-WindowsX64
8.1.5.000-IBM-SPCMS-WindowsI32
8.1.5.000-IBM-SPCMS-Linuxx86_64
8.1.5.000-IBM-SPOC-WindowsX64
8.1.5.000-IBM-SPOC-Linuxx86_64
8.1.5.000-IBM-SPOC-Linuxs390x
8.1.5.000-IBM-SPOC-LinuxPPC64le
8.1.5.000-IBM-SPOC-AIX
8.1.5.000-IBM-SPSRV-WindowsX64
8.1.5.000-IBM-SPSRV-Linuxx86_64
8.1.5.000-IBM-SPSRV-Linuxs390x
8.1.5.000-IBM-SPSRV-Linuxppc64le
8.1.5.000-IBM-SPSRV-AIX
IBM Spectrum Protect Server V8.1 Fix Pack 5 (V8.1.5) Downloads
IBM Spectrum Protect Server V7.1 Fix Pack 9 (7.1.9.000) Downloads
APAR status
Closed as program error.
Error description
In a replication environment it can be seen that the source server is in a hang or wait condition. Commands like 'QUERY PROCESS' or 'select from processes' will not return any output. 'QUERY SESSION' will show the replication sessions in status IdleW for a long time. From the IBM Spectrum Protect server monitoring data within the show.txt outputs you can see a admin session which acquires a mutex: Thread 7286, Parent 7258: SmAdminCommandThread, Storage 108500, AllocCnt 101 HighWaterAmt 434840 tid=e676, ptid=d85a, det=0, zomb=0, join=0, result=0, sess=0, procToken=0, sessToken=3725 Stack trace: 0x090000000056683c _global_lock_common 0x0900000000574108 _mutex_lock 0x0000000100007ed8 pkAcquireMutexTracked 0x0000000100281084 NrQueryCounts 0x00000001000d0c9c procQueryProcess 0x0000000101154044 AdmQueryProcess 0x0000000100636fc0 AdmCommandLocal 0x0000000100634978 admCommand 0x000000010064e448 PreFlushDataForSQL 0x000000010064d494 IPRA.$ScrubCmdInput 0x0000000100645a84 IPRA.$PreProcessQuery 0x0000000100648f94 AdmSQLExecute 0x0000000100636fc0 AdmCommandLocal 0x0000000100634978 admCommand 0x000000010099c7d4 SmAdminCommandThread 0x000000010000e300 StartThread Holding mutex PROCV->mutex (0x11141ffd8), acquired at process.c(1152) Holding mutex descP->tableMutex (0x130e1f8f8), acquired at output.c(1935) Acquiring mutex ctlP->fsArrayMutex (0x11f2add98) at nrmain.c(13409) <=== here the mutex is acquired Thread context: COMMAND: QUERY PROCESS COMMMETHOD: SSL THREAD_TYPE: SESSION SESSION_TYPE: ADMIN ADMIN_NAME: AAAA Also the replication sessions are waiting for the same mutex for example: Thread 475, Parent 468: NrReplicateFilespace, Storage 9010964, AllocCnt 545528 HighWaterAmt 9083686 tid=43db, ptid=34d4, det=0, zomb=0, join=0, result=0, sess=115, procToken=2, sessToken=106 Stack trace: 0x090000000056683c _global_lock_common 0x0900000000574108 _mutex_lock 0x0000000100007ed8 pkAcquireMutexTracked 0x0000000100289050 NrReplicateFilespace 0x000000010083d440 PcConsumerThread 0x000000010000e300 StartThread Acquiring mutex ctlP->fsArrayMutex (0x11f2add98) at nrmain.c(5162) <=== here the mutex is acquired Thread context: COMMAND: REPLICATE NODE SCHEDULE_TYPE: ADMIN SCHEDULE_NAME: REPLICATE_ALL_NODE_INITIAL PROCESS_NUMBER: 2 PROCESS_DESC: Replicate Node THREAD_TYPE: PROCESS SCHEDULED: YES One replication thread holds this mutex: Thread 468, Parent 466: NrReplicationThread, Storage 6550965781, AllocCnt 126580 HighWaterAmt 6794766785 tid=34d4, ptid=32d2, det=1, zomb=0, join=0, result=0, sess=177, procToken=2, sessToken=106 Stack trace: 0x0900000000589260 _cond_wait_global 0x0900000000589df8 _cond_wait 0x090000000058aae0 pthread_cond_wait 0x00000001000095b4 pkWaitConditionTracked 0x00000001002c63d8 EnqueueVarQueue 0x00000001008401a4 ProdConsPutWork 0x00000001002bd450 IPRA.$MakeTapeBatch 0x000000010028ece8 IPRA.$ProcessFsCompletion 0x0000000100283adc NrReplicationThread 0x000000010000e300 StartThread Holding mutex ctlP->fsArrayMutex (0x11f2add98), acquired at nrmain.c(3375) ===> here the mutex is hold Awaiting cond newQueue->notFull (0x113f47ff0), using mutex newQueue->mutex (0x11f601df8), at queue.c(1743) Thread context: COMMAND: REPLICATE NODE SCHEDULE_TYPE: ADMIN SCHEDULE_NAME: REPLICATE_ALL_NODE_INITIAL THREAD_TYPE: PROCESS PROCESS_DESC: Replicate Node PROCESS_NUMBER: 2 SCHEDULED: YES Customer/L2 Diagnostics: If the target replication server does not have enough mount points or volumes to satisfy all of the sessions storing data on the target server, the source server may hang. The hang is caused by a thread holding a mutex for processes and is waiting for a mutex for a specific process. The process mutex that the first thread holds causes other threads to wait for it that are holding other resources. IBM Spectrum Protect Server Version Affected: Version 7.1.x and above on all platforms Initial Impact: High Additional Keywords: TSM server Spectrum Protect hang freeze replication repl process
Local fix
Fixing the media wait issue on the target server will fix this hang on the source server.
Problem summary
**************************************************************** * USERS AFFECTED: * * All Spectrum Protect server users. * **************************************************************** * PROBLEM DESCRIPTION: * * See error description. * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in levels 7.1.9 and 8.1.5. Note that * * this is subject to change at the discretion of IBM. * ****************************************************************
Problem conclusion
This problem was fixed. Affected platforms: AIX, HP-UX, Solaris, Linux and Windows.
Temporary fix
Comments
APAR Information
APAR number
IT23192
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
81A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2017-11-23
Closed date
2018-02-13
Last modified date
2018-02-13
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV
Applicable component levels
R81A PSY
UP
R81L PSY
UP
R81W PSY
UP
Document Information
Modified date:
27 September 2021