IBM Support

IT42145: MIGRATION AND CLIENT SESSIONS HANG IF ANR8944E FLUSH IO ERROR HAPPENS WHILE THE MIGRATION IS USING TWO OR MORE TAPE DRIVES

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • If an IBM Spectrum Protect server process is using two tape
    volumes, or more, and one volume gets a flush IO error like the
    following:
    
    ANR8944E A hardware or media error occurred on drive RMTXX
    (/dev/rmtxx) with volume VOLXXX, OP=FLUSH, error number= 110,
    CC=0, KEY=03, ASC=09, ASCQ=00,
    SENSE=F0.00.03.00.00.00.00.58.00.00.00.00.09.00-
    .86.0A.50.35.00.00.00.03.01.31.08.70.90.02.A0.B9.10.1F.6-
    E.50.12.1F.AB.58.09.23.48.58.09.00.26.00.00.11.C7.12.00.-
    73.55.03.00.00.00.02.59.14.03.07.2E.35.00.50.2E.60.18.E0-
    .20.00.43.38.45.20.20.20.20.00.A0.43.4A.D1.F1.F1.F2.F0.F-
    6.81.00.00.3F.04.71.07, Description=An undetermined error has
    occurred). For input/output (I/O) error code descriptions, see
    the IBM Spectrum Protect documentation.
    
    ANR8359E Media fault detected on 3592 volume J11206 in drive
    RMTXX (/dev/rmtxx) of library LIBRXX
    
    
    Then this may cause a deadlock situation which will cause a
    hang of the affected process.
    
    All client sessions which will need to use the same tape
    storage pool  will also hang waiting on a lock.
    The problem has been mainly observed with migration process from
    disk to tape storage pool.
    When the hang happens, the following allows the problem to be
    confirmed:
    
    
    In  SHOW THREAD  command output, the affected migration thread
    "AfMigrVolumeThread" is waiting on a lock:
    
     Thread 212888, Parent 212881: AfMigrVolumeThread, Storage
       336304, AllocCnt 2128501 HighWaterAmt 2950272
     tid=17c98, ptid=17591, det=1, zomb=0, join=0, result=0,
       sess=0, procToken=23, sessToken=122054
      Stack trace:
           0x09000000005d89a0 _cond_wait_global
           0x09000000005d969c _cond_wait
           0x09000000005da00c pthread_cond_wait
           0x000000010000b2b4 pkWaitConditionTracked
           0x00000001002bc00c IPRA.$WaitForLock
           0x00000001002ba330 tmLockTracked
           0x0000000100791358 AsLockVolRootTracked
           0x000000010076bbf4 AsSetVolWriteError
           0x00000001007b5c6c AsPrepareOutput
           0x000000010079b954 AsPrepareTxn
           0x00000001005e8814 ssPrepareTxn
           0x00000001001116d4 CollectVotes
           0x0000000100110aac tmEndX
           0x0000000100ed10e8 IPRA.$MoveBatch
           0x0000000100edab64 IPRA.$MoveCluster
           0x0000000100ed9de0 IPRA.$MoveGroup
           0x0000000100ecb3ec IPRA.$MoveVolumeCollocated
           0x0000000100eca75c AfMoveVolume
           0x0000000101088154 IPRA.$MigrOnsiteVols
           0x0000000101086ecc AfMigrVolumeThread
           0x00000001000114b0 StartThread
     Awaiting cond waitP->waiting (0x211979df0), using mutex
      TMV->mutex (0x11169d7e0), at tmlock.c(2528)
        Thread context:
              COMMAND: MIGRATE STGPOOL
              SCHEDULE_TYPE: ADMIN
              SESSION: 141178
              SCHEDULE_NAME: zzzzzzz
              PROCESS_DESC: MIGRATION
              THREAD_TYPE: PROCESS
              PROCESS_NUMBER: 23
              SCHEDULED: YES
    
    In   SHOW LOCK  command, the above thread 212888, already holds
    a lock type "36001" , and the same thread is also waiting on the
    lock it already holds.
    
    
    LockDesc: Type=36001(as volume root), NameSpace=0,
    SummMode=ixLock, Key=''
     Holder: (astxn.c:3785 Thread 212888) Tsn=0:16751409,
    Mode=ixLock
     Waiter: (asvolacq.c:7379 Thread 225397) Tsn=0:16768752,
    Mode=sixLock
     Waiter: (asvolacq.c:2428 Thread 225401) Tsn=0:16768816,
    Mode=ixLock
     Waiter: (asvol.c:4693 Thread 213207) Tsn=0:16768819,
    Mode=ixLock
     Waiter: (asvolut.c:1429 Thread 203) Tsn=0:16768824, Mode=isLock
    
       ... lot of waiters may appears ....
     Waiter: (asvolacq.c:2428 Thread 226537) Tsn=0:16786970,
    Mode=ixLock
     Waiter: (asvolacq.c:2428 Thread 226553) Tsn=0:16787274,
    Mode=ixLock
     Waiter: (asvol.c:2223 Thread 212888) Tsn=0:16803182,
    Mode=ixLock
    
    If a trace can be collected with flags PVR MMS, it will show
    this:
    
    [astxn.c][926][AsPrepareTxn]:Prepare for txn commit or abort;
     index 1.
    [asalloc.c][10425][FlushVolume]:Flushing data to volume
     VOLXXX(1723087).
    [pvr.c][3617][pvrStartFlush]:Starting flush.
    [asalloc.c][10430][FlushVolume]:pvrFlush rc=2813.
    [tmtxn.c][524][tmBeginNamed]:Transaction 0:316679254 is
     starting from asvol.c(2217).
    [tmlock.c][769][tmLockTracked]:Tid=0:316679254,
     Type=36001(as volume root), NameSpace=0, Key=(nil) ,Mode=ixLock
    (UnCond) from asvol.c:2223
    [tmlock.c][2456][WaitForLock]:Entering
    [tmutil.c][427][tmGetDynWaitTime]:Enter, txnId
    14fab46c0 waitTime 16599ee70
    [tmutil.c][442][tmGetDynWaitTime]:Exit, rc=0, *waitTime=0
    [tmlock.c][2512][WaitForLock]:admResUpdate returned 0.
    
    The signature of the problem  is the  pvrflush rc 2813   + the
    "tmLockTracked" for lock type 36001 requested in ixLock mode
    
    
    IBM Spectrum Protect Versions Affected:
       V8.1 on all supported platforms
    
    Additional Keywords: TS008454697  deadlock AsSetVolWriteError()
    

Local fix

  • Run migration with a single drive.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All IBM Spectrum Protect server users.                       *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description.                                       *
    *                                                              *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available.                           *
    * This problem is currently projected to be fixed in levels    *
    * 8.1.16.100 and 8.1.17.                                       *
    * Note that this is subject to change at the discretion of     *
    * IBM.                                                         *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platform: AIX, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT42145

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    81A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-09-26

  • Closed date

    2022-10-20

  • Last modified date

    2022-10-20

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81A","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
09 December 2022