Direct links to fixes
APAR status
Closed as program error.
Error description
Server can crash when retrieving a fragment from a container pool when the fragment has no extents. Getcoreinfo.txt on Linux platform will look like this for a crash during replicate node process: #0 0x0000000000ca4948 in sdRtrvFragment (sessH=0x1720a838, inTxnId=<optimized out>, objectId=26248244, chunkType=SdChunkTypeDedup, objectOffset=0, objectLength=0, sinkFunc=0xc9db30 <SdEndToEndSinkFunc>, contextP=0x7f40d0a1cbe8, fragSeq=4, numFragments=9, fragHdr=0, thisIsPOR=False, bytesTransferredP=0x7f4044f047a8) at sdrtrv.c:1161 #1 0x0000000000708fad in bfRtrv (sessHandle=0xcac9fd8, bfId=26239633, bfOffset=0, bfLength=0, mountWaitMode=bfWaitMount, rtrvType=bfExternalRtrv, noQueryRestore=False, sinkFunc=0xda8fc0 <SmSendData>, contextP=0x7f40d0a1cbe8, doRetry=True, thisPoolOnly=0, thisStrategyOnly=0, thisFragOnly=False, bfSize=17821889536, isFragmented=True, bytesTransferredP=0x7f4044f047a8) at bfrtrv.c:1475 #2 0x0000000000e3e235 in smReplRtrv (handle=0x7f40a1a78958, bfHandle=0xcac9fd8, objId=26239633, bfSize=17821889536, hdrLen=426, metaSize=194, copyType=<optimized out>, doRetry=True, bytesTransferredP=0x7f4044f047a8, lastRetry=False, isSuperAggregate=True, isSdObject=True, isSdTarget=True, isCloudTarget=False) at smrepl.c:6062 #3 0x00000000009ac621 in NrReplicateBatch (argP=0x834f1e8, workP=0x7f40c058b088) at nrmain.c:12147 #4 0x0000000000fbc025 in PcConsumerThread (argP=<optimized out>) at prodcons.c:633 #5 0x000000000103ac62 in StartThread (startInfoP=0x0) at pkthread.c:3779 #6 0x00007f40ecb8f806 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f40e8d8864d in clone () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () Diagnostics: In getcoreinfo.txt output it can be seen: #0 0x0000000000ca4948 in sdRtrvFragment (sessH=0x1720a838, inTxnId=<optimized out>, objectId=26248244, chunkType=SdChunkTypeDedup, objectOffset=0, objectLength=0, sinkFunc=0xc9db30 <SdEndToEndSinkFunc>, contextP=0x7f40d0a1cbe8, fragSeq=4, numFragments=9, fragHdr=0, thisIsPOR=False, bytesTransferredP=0x7f4044f047a8) at sdrtrv.c:1161 so there is objectid=26248244 identified. Run: show invo 26248244 listchunks=yes From the output, it will be seen that it is a fragment of a super aggregate: SHOW INVO 26248244 LISTCHUNKS=YES Object 26248244 NOT FOUND. Bitfile Object: 26248244 **Super-Bitfile 26248244 is a fragment in Super Aggregate 26239633 Bitfile Object NOT found. Then run show invo for super aggregate: show invo 26239633 listchunks=yes From the output, all Fragment IDs are listed: ***** Fragment Information ***** Super-Bitfile 26239633 is a Super Aggregate with 9 fragments. Fragment ID: 26239633 Sequence Number: 0 User Bytes: 2003814235, pendingId: -1 Fragment ID: 26242866 Sequence Number: 1 User Bytes: 2003813744, pendingId: -1 Fragment ID: 26244298 Sequence Number: 2 User Bytes: 2003863706, pendingId: -1 Fragment ID: 26245769 Sequence Number: 3 User Bytes: 2003813744, pendingId: -1 Fragment ID: 26248244 Sequence Number: 4 User Bytes: 2003763508, pendingId: -1 Fragment ID: 26250091 Sequence Number: 5 User Bytes: 2003813744, pendingId: -1 Fragment ID: 26251787 Sequence Number: 6 User Bytes: 2003863706, pendingId: -1 Fragment ID: 26253245 Sequence Number: 7 User Bytes: 2003813704, pendingId: -1 Fragment ID: 26255398 Sequence Number: 8 User Bytes: 1791329062, pendingId: -1 Then in DB2 run: db2 "select objid,count from tsmdb1.sd_recon_order where objid in 26239633, 26242866, 26244298, 26245769, 26248244, 26250091, 26251787, 26253245, 26255398) group by objid" OBJID 2 -------------------- ----------- 26239633 40073 26242866 40072 26244298 40073 26245769 40072 26250091 40072 26251787 40073 26253245 40072 26255398 35823 From this output it can be seen that fragment 26244298 is not listed as it has no extents. Thus the super aggregate has to be deleted to prevent the crash occurring. IBM Spectrum Protect Versions Affected: IBM Spectrum Protect Server: 7.1.3.x and higher on all platforms Initial Impact: Medium Additional Keywords: TSM IBM Spectrum Protect container pools crash core extents fragment 117481
Local fix
Contact IBM support for assistance in deleting the affected super aggregate object.
Problem summary
**************************************************************** * USERS AFFECTED: * * All Tivoli Storage Manager server users. * **************************************************************** * PROBLEM DESCRIPTION: * * See error description. * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in level 7.1.6. Note that this is * * subject to change at the discretion of IBM. * ****************************************************************
Problem conclusion
This problem was fixed. Affected platforms: AIX, Solaris, Linux, and Windows.
Temporary fix
Comments
APAR Information
APAR number
IT15717
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
71L
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2016-06-14
Closed date
2016-06-16
Last modified date
2016-06-16
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV
Applicable component levels
R71A PSY
UP
R71L PSY
UP
R71S PSY
UP
R71W PSY
UP
[{"Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1.3"}]
Document Information
Modified date:
25 September 2021