IBM Support

IT36608: SPACE MANAGEMENT MIGHT EXPERIENCE A DEADLOCK SITUATION WITH DSMRECALLD DEAMON OR DSMMIGRATE COMMAND

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Under some timing condition, a migrate or a recall operation
    might deadlock itself trying to create
    a temporary file in the /fs/.SpaceMan/logdir/ directory while
    simultaneously holding an exclusive
    right on the file object that is being migrated or recalled on
    the same HSM managed file system.
    
    Customer/Support Diagnostics (If Applicable):
    
    The following script might show potentially hanging dsmmigrate /
     dsmrecalld processes in the system log (e.g.
    "/var/log/messages" on Linux):
    
    for i in /var/log/messages* ; do who=$(grep "blocked for more
    than" $i | awk '{gsub(":", " ", $0); print $10}' | sort | uniq
    -c); echo $i, $who; done
    ...
    /var/log/messages, 4 dsmmigrate 1 dsmrecalld
    ...
    
    There might be the following dsmmigrate / dsmrecalld related
    messages in the system log as well (e.g. "/var/log/messages" on
    Linux) with some hang symptoms inside Linux kernel
    filename_lookup() VFS call:
    ...
    2021-03-29T00:13:38.403410+11:00 hostname kernel:
    [6364412.376579] INFO: task dsmmigrate:73490 blocked for more
    than 120 seconds.
    2021-03-29T00:13:38.403428+11:00 hostname kernel:
    [6364412.378576] "echo 0 >
    /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    2021-03-29T00:13:38.403430+11:00 hostname kernel:
    [6364412.380530] dsmmigrate      D ffff88103df99208     0 73490
    73488 0x00000080
    2021-03-29T00:13:38.403431+11:00 hostname kernel:
    [6364412.380536]  ffff880493f7f440 0000000000000086
    ffff88054ffa5ee0 ffff880493f7ffd8
    2021-03-29T00:13:38.403432+11:00 hostname kernel:
    [6364412.380540]  ffff880493f7ffd8 ffff880493f7ffd8
    ffff88054ffa5ee0 ffff88103df99218
    2021-03-29T00:13:38.403433+11:00 hostname kernel:
    [6364412.380542]  0000000000000000 ffff88054ffa5ee0
    ffffffffc0c75760 ffff88103df99208
    2021-03-29T00:13:38.403434+11:00 hostname kernel:
    [6364412.380546] Call Trace:
    2021-03-29T00:13:38.403436+11:00 hostname kernel:
    [6364412.380556]  [<ffffffff816a94e9>] schedule+0x29/0x70
    2021-03-29T00:13:38.403437+11:00 hostname kernel:
    [6364412.380581]  [<ffffffffc070dee1>]
    cxiWaitEventWait+0x1d1/0x2f0 [mmfslinux]
    ...
    2021-03-29T00:13:38.403460+11:00 hostname kernel:
    [6364412.380980]  [<ffffffff8120f34b>] filename_lookup+0x2b/0xc0
    2021-03-29T00:13:38.403461+11:00 hostname kernel:
    [6364412.380983]  [<ffffffff81212ec7>]
    user_path_at_empty+0x67/0xc0
    2021-03-29T00:13:38.403476+11:00 hostname kernel:
    [6364412.380985]  [<ffffffff81212f31>] user_path_at+0x11/0x20
    2021-03-29T00:13:38.403477+11:00 hostname kernel:
    [6364412.380990]  [<ffffffff81206473>] vfs_fstatat+0x63/0xc0
    2021-03-29T00:13:38.403478+11:00 hostname kernel:
    [6364412.380993]  [<ffffffff81206a41>] SYSC_newlstat+0x31/0x60
    2021-03-29T00:13:38.403479+11:00 hostname kernel:
    [6364412.380999]  [<ffffffff8111f5c6>] ?
    __audit_syscall_exit+0x1e6/0x280
    2021-03-29T00:13:38.403480+11:00 hostname kernel:
    [6364412.381002]  [<ffffffff81206cce>] SyS_newlstat+0xe/0x10
    2021-03-29T00:13:38.403482+11:00 hostname kernel:
    [6364412.381007]  [<ffffffff816b5009>]
    system_call_fastpath+0x16/0x1b
    ...
    2021-03-30T08:13:38.416027+11:00 hostname kernel:
    [6479609.674271] INFO: task dsmrecalld:161985 blocked for more
    than 120 seconds.
    2021-03-30T08:13:38.416047+11:00 hostname kernel:
    [6479609.676145] "echo 0 >
    /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    2021-03-30T08:13:38.416048+11:00 hostname kernel:
    [6479609.677961] dsmrecalld      D ffff881594fc71c8     0 161985
    161960 0x00000080
    2021-03-30T08:13:38.416074+11:00 hostname kernel:
    [6479609.677967]  ffff8813cabb3400 0000000000000086
    ffff88201eb3eeb0 ffff8813cabb3fd8
    2021-03-30T08:13:38.416075+11:00 hostname kernel:
    [6479609.677972]  ffff8813cabb3fd8 ffff8813cabb3fd8
    ffff88201eb3eeb0 ffff881594fc71d8
    2021-03-30T08:13:38.416077+11:00 hostname kernel:
    [6479609.677975]  0000000000000000 ffff88201eb3eeb0
    ffffffffc0c75eb0 ffff881594fc71c8
    2021-03-30T08:13:38.416078+11:00 hostname kernel:
    [6479609.677980] Call Trace:
    2021-03-30T08:13:38.416079+11:00 hostname kernel:
    [6479609.677991]  [<ffffffff816a94e9>] schedule+0x29/0x70
    2021-03-30T08:13:38.416080+11:00 hostname kernel:
    [6479609.678013]  [<ffffffffc070dee1>]
    cxiWaitEventWait+0x1d1/0x2f0 [mmfslinux]
    ...
    2021-03-30T08:13:38.417064+11:00 hostname kernel:
    [6479609.678439]  [<ffffffff8120f34b>] filename_lookup+0x2b/0xc0
    2021-03-30T08:13:38.417065+11:00 hostname kernel:
    [6479609.678442]  [<ffffffff81212ec7>]
    user_path_at_empty+0x67/0xc0
    2021-03-30T08:13:38.417065+11:00 hostname kernel:
    [6479609.678454]  [<ffffffffc06fe98f>] ?
    ss_fs_unlocked_ioctl+0x13f/0x530 [mmfslinux]
    2021-03-30T08:13:38.417066+11:00 hostname kernel:
    [6479609.678457]  [<ffffffff81212f31>] user_path_at+0x11/0x20
    2021-03-30T08:13:38.417067+11:00 hostname kernel:
    [6479609.678461]  [<ffffffff81206473>] vfs_fstatat+0x63/0xc0
    2021-03-30T08:13:38.417068+11:00 hostname kernel:
    [6479609.678465]  [<ffffffff812069de>] SYSC_newstat+0x2e/0x60
    2021-03-30T08:13:38.417069+11:00 hostname kernel:
    [6479609.678468]  [<ffffffff81200dfa>] ? vfs_write+0x17a/0x1e0
    2021-03-30T08:13:38.417070+11:00 hostname kernel:
    [6479609.678472]  [<ffffffff8111f5c6>] ?
    __audit_syscall_exit+0x1e6/0x280
    2021-03-30T08:13:38.417071+11:00 hostname kernel:
    [6479609.678476]  [<ffffffff81206cbe>] SyS_newstat+0xe/0x10
    2021-03-30T08:13:38.417071+11:00 hostname kernel:
    [6479609.678481]  [<ffffffff816b5009>]
    system_call_fastpath+0x16/0x1b
    ...
    2021-03-30T08:14:04.121036+11:00 hostname dsmwatchd:
    HSM(pid:9824): Restart local recall service. Reason: Recall
    service hang
    ...
    
    IBM Spectrum Protect Versions Affected: Space Management Client
    for Unix and Linux 7.1.x and 8.1.x on all supported platforms
    
    Initial Impact: Low
    
    Additional Keywords:IBM Spectrum Protect; HSM; TS004921636;
    recall; deadlock ; dsmrecalld ; migrate; recall
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect for Space Management (HSM) client       *
    * versions 7.1.x and 8.1.x on all platforms                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See ERROR DESCRIPTION.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is projected *
    * to be fixed in IBM Spectrum Protect for Space Management     *
    * client level 8.1.13. Note that this is subject to change at  *
    * the discretion of IBM.                                       *
    ****************************************************************
    

Problem conclusion

  • After the fix, the IBM Spectrum Protect for Space Management
    recall function will work as expected.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT36608

  • Reported component name

    TSM SPACE MGMT

  • Reported component ID

    5698HSMCL

  • Reported release

    81L

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-04-27

  • Closed date

    2021-06-08

  • Last modified date

    2021-06-08

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • dsmrecal
    

Fix information

  • Fixed component name

    TSM SPACE MGMT

  • Fixed component ID

    5698HSMCL

Applicable component levels

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSSR2R","label":"Tivoli Storage Manager for Space Management"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81L"}]

Document Information

Modified date:
09 June 2021